gpt4 book ai didi

r - 如何使用R将大型CSV数据文件拆分为单个数据文件?

转载 作者:行者123 更新时间:2023-12-04 18:16:00 26 4
gpt4 key购买 nike

我有一个CSV文件,其第一行包含变量名称,其余各行包含数据。有什么好办法可以将其分解为每个文件,每个文件在R中仅包含一个变量?这个解决方案会很健壮吗?例如。如果输入文件的大小为100G怎么办?

输入文件看起来像

var1,var2,var3
1,2,hello
2,5,yay
...

我想创建3个(或多个变量)文件var1.csv,var2.csv,var3.csv
这样文件就像
文件1
var1
1
2
...

文件2
var2?
2
5
...

文件3
var3
hello
yay

我有Python( How to break a large CSV data file into individual data files?)解决方案,但我想知道R是否可以做同样的事情?必不可少的Python代码逐行读取csv文件,然后一次将这些行写出。 R可以做同样的事情吗?命令read.csv一次读取整个文件,这会减慢整个过程的速度。另外,由于R尝试将整个文件读入内存,因此它无法读取100G文件并对其进行处理。我在R中找不到可以让您逐行读取csv文件的命令。请帮忙。谢谢!!

最佳答案

您可以scan,然后write一次到一行文件。

i <- 0
while({x <- scan("file.csv", sep = ",", skip = i, nlines = 1, what = "character");
length(x) > 1}) {
write(x[1], "file1.csv", sep = ",", append = T)
write(x[2], "file2.csv", sep = ",", append = T)
write(x[3], "file3.csv", sep = ",", append = T)
i <- i + 1
}

编辑!!我正在使用以上数据,已复制1000多次。当我们始终打开文件连接时,我已经进行了速度比较。
ver1 <- function() {
i <- 0
while({x <- scan("file.csv", sep = ",", skip = i, nlines = 1, what = "character");
length(x) > 1}) {
write(x[1], "file1.csv", sep = ",", append = T)
write(x[2], "file2.csv", sep = ",", append = T)
write(x[3], "file3.csv", sep = ",", append = T)
i <- i + 1
}
}

system.time(ver1()) # w/ close to 3K lines of data, 3 columns
## user system elapsed
## 2.809 0.417 3.629

ver2 <- function() {
f <- file("file.csv", "r")
f1 <- file("file1.csv", "w")
f2 <- file("file2.csv", "w")
f3 <- file("file3.csv", "w")
while({x <- scan(f, sep = ",", skip = 0, nlines = 1, what = "character");
length(x) > 1}) {
write(x[1], file = f1, sep = ",", append = T, ncol = 1)
write(x[2], file = f2, sep = ",", append = T, ncol = 1)
write(x[3], file = f3, sep = ",", append = T, ncol = 1)
}
closeAllConnections()
}

system.time(ver2())
## user system elapsed
## 0.257 0.098 0.409

关于r - 如何使用R将大型CSV数据文件拆分为单个数据文件?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/3376513/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com