gpt4 book ai didi

python - 对字符向量进行排序时的不同结果

转载 作者:行者123 更新时间:2023-11-28 16:25:51 24 4
gpt4 key购买 nike

我想知道在对字符向量进行排序时,R 排序算法是如何工作的

a = c("aa(150)", "aa(1)S")
sort(a)
# [1] "aa(150)" "aa(1)S"
a = c("aa(150)", "aa(1)")
sort(a)
# [1] "aa(1)" "aa(150)"

R不是从左到右逐一比较字符的整数值吗?为什么添加一个字符可以改变结果?

我以为排序是由“5”和“)”字符决定的,后面的字符被忽略。

与Python比较

In [1]: a=["aa(150)","aa(1)"]
In [2]: sorted(a)
Out[2]: ['aa(1)', 'aa(150)']
In [3]: a=["aa(150)","aa(1)S"]
In [4]: sorted(a)
Out[4]: ['aa(1)S', 'aa(150)']

最佳答案

将语言环境设置为默认值,这将在大多数情况下关闭特定于语言环境的排序:

Sys.setlocale("LC_COLLATE", "C")
a=c("aa(150)","aa(1)S")
sort(a)
#[1] "aa(1)S" "aa(150)"

由于语言差异,字符串排序必须是国际特定的。来自 ?sort 的帮助:

The sort order for character vectors will depend on the collating sequence of the locale in use: see Comparison.

然后我们可以转到 ?Comparisons 以获得:

Comparison of strings in character vectors is lexicographic within the strings using the collating sequence of the locale in use: see locales. The collating sequence of locales such as en_US is normally different from C (which should use ASCII) and can be surprising. Beware of making any assumptions about the collation order: e.g. in Estonian Z comes between S and T, and collation is not necessarily character-by-character – in Danish aa sorts as a single letter, after z. In Welsh ng may or may not be a single sorting unit: if it is it follows g.

如前所述,由于每种语言使用字母的方式不同,因此区域设置对于排序很重要。

关于python - 对字符向量进行排序时的不同结果,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36757762/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com