Sys.getlocale() [1] "en_-6ren">
gpt4 book ai didi

r - 为什么 R 3.6.0 在评估表达式 ("Dogs"< "cats"时返回 FALSE )?

转载 作者:行者123 更新时间:2023-12-04 00:53:54 24 4
gpt4 key购买 nike

我有一些复杂的代码,但不是向您展示,我将提取问题的本质。

评估:"dogs" < "cats" ... 这应该评估为 FALSE 并且它在 R 3.6 中。

评估: "Dogs" < "cats" ... 这应该评估为 TRUE 因为“D”的 ASCII 代码是 68,而“c”的 ASCII 代码是 99。由于 68 < 99,"Dogs" < "cats" 应该评估为 TRUE ,但是它不在 R 3.6.0 中。但是,当我尝试使用 https://datacamp.com 网站上的控制台窗口时,表达式 "Dogs" < "cats" 返回了 TRUE,表达式 "dogs" < "Cats" 返回了 FALSE - 正如预期的那样。

因此,我的问题是,为什么 R 3.6.0 为 ( FALSE ) 返回 "Dogs" < "cats"

最佳答案

DataCamp 的解释器显示:

> Sys.getlocale()
[1] "C"

而我的,也许你的:
> Sys.getlocale()
[1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"

对于 "C"语言环境,字符按其 ascii 值进行比较,而对于 en_US.UTF-8,它们会变为 aAbBcC,依此类推。

正如评论中提到的,这在关系运算符的文档中有进一步的解释:

Comparison of strings in character vectors is lexicographic within the strings using the collating sequence of the locale in use: see locales. The collating sequence of locales such as en_US is normally different from C (which should use ASCII) and can be surprising. Beware of making any assumptions about the collation order: e.g. in Estonian Z comes between S and T, and collation is not necessarily character-by-character – in Danish aa sorts as a single letter, after z. In Welsh ng may or may not be a single sorting unit: if it is it follows g. Some platforms may not respect the locale and always sort in numerical order of the bytes in an 8-bit locale, or in Unicode code-point order for a UTF-8 locale (and may not sort in the same order for the same language in different character sets). Collation of non-letters (spaces, punctuation signs, hyphens, fractions and so on) is even more problematic.

关于r - 为什么 R 3.6.0 在评估表达式 ("Dogs"< "cats"时返回 FALSE )?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56485774/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com