gpt4 book ai didi

r - R:更改数据框中列的字符编码

转载 作者:行者123 更新时间:2023-12-02 11:22:05 24 4
gpt4 key购买 nike

我正在研究字符编码如何影响排序。我的问题是:

如何将数据框的单列更改为不同的字符编码?

对于上下文,我将在底部包括几个额外的步骤。

1)创建数据框:

d.enc <- data.frame( utf8 = c(" ", "_ ", " _"), 
mac = c(" ", "_ ", " _"),
label = c("space", "underscore space", "space underscore") )

2)转换为字符向量并尝试设置编码:
d.enc2$utf8 <- as.character(d.enc$utf8)
d.enc2$mac <- as.character(d.enc$mac)
d.enc2$label <- as.character(d.enc$label)

Encoding(d.enc2$utf8) <- "UTF-8"
Encoding(d.enc2$mac) <- "MACINTOSH"
Encoding(d.enc2$utf8)
# [1] "unknown" "unknown" "unknown"
Encoding(d.enc2$mac)
# [1] "unknown" "unknown" "unknown"

3)那不是我想要的。我本来期望:
# [1] "UTF-8" "UTF-8" "UTF-8" and
# [1] "MACINTOSH" "MACINTOSH" "MACINTOSH"

4)是否支持我所需的编码? (在Mac上运行)
temp <- iconvlist()
temp[399]
# [1] "UTF-8"
temp[338]
# [1] "MACINTOSH"

似乎它们受到支持。

5)一旦可以更改编码,我想执行以下操作以查看排序顺序如何变化:
library(dplyr)
arrange(d.enc2, desc(utf8))
arrange(d.enc2, desc(mac))

6)我希望输出看起来像这样,但是以不同的顺序,这取决于用于排序的列:
  utf8 mac            label
1 _ _ underscore space
2 _ _ space underscore
3 space

感谢您的提示!

最佳答案

也许迟了,但我在以下位置看到了这一点:
R- Changing encoding of column in dataframe?

for (col in colnames(mydataframe)){
Encoding(mydataframe[[col]]) <- "UTF-8"}

关于r - R:更改数据框中列的字符编码,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35987368/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com