gpt4 book ai didi

r - 分割字符串并转换为数据框

转载 作者:行者123 更新时间:2023-12-02 09:07:18 26 4
gpt4 key购买 nike

我有一个 65k 元素的字符向量,格式为.每个元素的长度不同,但范围从 3 到 8(基于逗号):

b[1]= "aaaa, bbbb, cccc"
...
b[1000]="aaaa, bbbb, cccc, dddd, eeee, ffff"
...
b[3000]="aaaa, bbbb, cccc, dddd, eeee, ffff, gggg"
b[3001]="aaaa, bbbb, cccc"

我想转换为数据框:

row  col1 col2 col3 col4 col5 col6 col7
1 aaaa bbbb cccc
1000 aaaa bbbb cccc dddd eeee ffff
3000 aaaa bbbb cccc dddd eeee ffff gggg

我尝试过:

 data.frame( do.call( rbind, strsplit( b, ',' ) ) ) 

得到:

Warning message: In (function (..., deparse.level = 1) : number of columns of result is not a multiple of vector length (arg 1)

有什么建议吗?

最佳答案

将字符串粘贴在一起并用 "\n" 折叠后,我们可以使用 read.csv

read.csv(text = paste0(b, collapse = "\n"), header = FALSE)

# V1 V2 V3 V4 V5 V6 V7
#1 aaaa bbbb cccc
#2 aaaa bbbb cccc dddd eeee ffff
#3 aaaa bbbb cccc dddd eeee ffff gggg

如果您想将空字符串读取为NA,请在na.strings中指定它们

read.csv(text = paste0(b, collapse = "\n"), header = FALSE, na.strings = "")

另一个选项是来自stringistri_list2matrix

data.frame(stringi::stri_list2matrix(strsplit(b, ","), byrow = TRUE))

# X1 X2 X3 X4 X5 X6 X7
#1 aaaa bbbb cccc <NA> <NA> <NA> <NA>
#2 aaaa bbbb cccc dddd eeee ffff <NA>
#3 aaaa bbbb cccc dddd eeee ffff gggg

数据

b <- c("aaaa, bbbb, cccc", "aaaa, bbbb, cccc, dddd, eeee, ffff", 
"aaaa, bbbb, cccc, dddd, eeee, ffff, gggg")

关于r - 分割字符串并转换为数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56557690/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com