gpt4 book ai didi

r - R中的内部字符串缓存

转载 作者:行者123 更新时间:2023-12-04 04:05:40 30 4
gpt4 key购买 nike

这个问题源于以下data.table错误报告-#4978,但是我将使用data.frame示例来说明这不是特定于data.table的问题:

考虑以下:

df = data.frame(a = 1, hø = 1)

identical(names(df), c("a", "hø"))
#[1] TRUE

.Internal(inspect(names(df)))
#@0x0000000007b27458 16 STRSXP g0c2 [NAM(2)] (len=2, tl=0)
# @0x000000000ee604c0 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "a"
# @0x0000000007cfa910 09 CHARSXP g0c1 [gp=0x21] [cached] "hø"

.Internal(inspect(c("a", "hø")))
#@0x0000000007b274c8 16 STRSXP g0c2 [] (len=2, tl=0)
# @0x000000000ee604c0 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "a"
# @0x0000000007cfa970 09 CHARSXP g0c1 [gp=0x24,ATT] [latin1] [cached] "hø"

请注意,即使 identical认为两者相同,底层的字符串缓存也将“hø”存储在两个不同的位置,而将“a”存储在一个位置。怎么了?这是R字符串缓存错误吗?

这很重要的原因是 %chin%在此处失败(由于上述差异):
library(data.table)
"a" %chin% names(df)
#[1] TRUE
"hø" %chin% names(df)
#[1] FALSE

最佳答案

直接打印到控制台时,"hø"被标记为采用UTF-8编码。您可以使用enc2native强制它为 native ,但此问题消失了,但是我仍在研究为什么这是...

Encoding("hø")
# [1] "UTF-8"

.Internal( inspect( c( "a" , enc2native("hø") ) ) )
#@1081d60a0 16 STRSXP g0c2 [] (len=2, tl=0)
# @100af87d8 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "a"
# @1081e3a08 09 CHARSXP g1c1 [MARK,gp=0x21] [cached] "hø"

enc2native("hø") %chin% names(df)
#[1] TRUE

Encoding帮助页面上,有很多相关信息,我想这很相关:

There are other ways for character strings to acquire a declared encoding apart from explicitly setting it (and these have changed as R has evolved). Functions scan, read.table, readLines, and parse have an encoding argument that is used to declare encodings, iconv declares encodings from its from argument, and console input in suitable locales is also declared. intToUtf8 declares its output as "UTF-8", and output text connections (see textConnection) are marked if running in a suitable locale. Under some circumstances (see its help page) source(encoding=) will mark encodings of character strings it outputs.



更新

在我看来,基本ASCII字符(字符代码0-127)集中的所有内容都将获得 "unknown"编码,并且默认情况下,此字符之外的任何字符都将设置为 "UTF-8",包括扩展的ASCII代码(字符代码128-255)。

关于r - R中的内部字符串缓存,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19257962/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com