gpt4 book ai didi

r - 如何处理包含带逗号的文本行的 .csv 文件?

转载 作者:行者123 更新时间:2023-12-01 13:17:04 26 4
gpt4 key购买 nike

我正在使用 read.delim 函数,但由于我正在阅读的文本行还包含用户使用逗号 (",") 的注释,因此注释会分为两列或更多列。

下面是数据集中的两行:

@Zillaman u just aite all types of food at Zina crib and didnt even think about me!!!!,0

I must have been only 11 when Mr Peepers started. It was a must see for the whole family, I believe on Sun...,1

第一行被正确阅读。在下一列中读取“0”。第二行被分成三列,最后一列包含“1”

dataset_original = read.delim('TrainingData.csv', 
quote = "",
row.names = NULL,
stringsAsFactors = FALSE,
header = F, as.is = F,
colClasses = "character",
blank.lines.skip = T,
sep = ",")

最佳答案

尝试单独阅读所有行,然后将文本列和目标列分开。

试试这个:

df= read.delim('TrainingData.csv',
quote = "",
row.names = NULL,
stringsAsFactors = FALSE,
header = F, as.is = F,
colClasses = "character",
blank.lines.skip = T,
sep = "\n")


df$target = regmatches(df$V1, regexpr(pattern = "[^,]*$", text = df$V1))
df$V1 = sub(pattern = ",[^,]*$", replacement = "", x = df$V1)

其中df代表dataset_original

示例:

文件包含:

hello,0
world,1
not,right,1
this,one,is,even,worse,0

此方法返回:

> df
V1 target
1 hello 0
2 world 1
3 not,right 1
4 this,one,is,even,worse 0

关于r - 如何处理包含带逗号的文本行的 .csv 文件?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53970548/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com