gpt4 book ai didi

R 中没有引号的 read.csv 行

转载 作者:行者123 更新时间:2023-12-02 04:49:40 25 4
gpt4 key购买 nike

我正在尝试从 R 中读取一个巨大的 csv 文件,但我遇到了麻烦,因为假定为字符串格式的列的元素没有用引号分隔,并且每次都创建一个新行是一条新线。我的数据由 ~ 分隔。

例如,我的数据看起来与此类似:

a ~ b ~ c ~ d ~ e
1 ~ name1 ~ This is a paragraph.

This is a second paragraph.

~ num1 ~ num2 ~

2 ~ name2 ~ This is an new set of paragraph.

~ num1 ~ num2 ~

我希望得到这样的东西:

a |      b     |         c                                        |  d     |   e   |____________________________________________________________________________________1 |    name1  | This is a paragraph. This is a second paragraph.  |  num1  | num2  |2 |    name2  | This is a new set of paragraph.                   |  num1  | num2  |

But I end up with something ugly like this:

a                          |    b    |         c               |  d     |   e   |__________________________________________________________________________________1                          |  name1  |   This is a paragraph.  |        |       |This is a second paragraph |         |                         |        |       |                           |  num1   |        num22                          |  name2  | This is a new set of paragraph. | num1 | num2  |

I tried to set allowEscapes = TRUE in read.csv but that didn't do the trick. My input currently looks like this:

read.csv(filename, header = T, sep = '~', stringAsFactors = F, fileEncoding = "latin1", quote = "", strip.white = TRUE)

我的下一个想法是在每个 ~ 之后插入一个引号,但我希望看看是否有更好的方法。

如有任何帮助,我们将不胜感激。

最佳答案

例如这样的事情:

ll = readLines(textConnection('a ~ b ~ c ~ d ~ e
1 ~ name1 ~ This is a paragraph.
This is a second paragraph.
~ num1 ~ num2 ~
2 ~ name2 ~ This is an new set of paragraph.
~ num1 ~ num2 ~'))
## each line begin with a numeric followed by a space
## I use this pattern to sperate lines
llines <- split(ll[-1],cumsum(grepl('^[0-9] ',ll[-1])))
## add the header to the splitted and concatenated lines
read.table(text=unlist(c(ll[1],lapply(llines,paste,collapse=''))),
sep='~',header=TRUE)


a b c d e
1 name1 This is a paragraph. This is a second paragraph. num1 num2 NA
2 name2 This is an new set of paragraph. num1 num2 NA

关于R 中没有引号的 read.csv 行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19104043/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com