gpt4 book ai didi

r 读入多个 .dat 文件

转载 作者:行者123 更新时间:2023-12-03 18:26:44 25 4
gpt4 key购买 nike

嗨,我是新来的,也是 R 的初学者,

我的问题:
如果我在 R 中有多个文件(test1.dat,test2.dat,...)可以使用,我使用此代码读取它们

filelist <- list.files(pattern = "*.dat")

df_list <- lapply(filelist, function(x) read.table(x, header = FALSE, sep = ","
,colClasses = "factor", comment.char = "",
col.names = "raw"))

现在我遇到了数据很大的问题,我找到了一个使用 sqldf-package 加快速度的解决方案:
sql <- file("test2.dat")
df <- sqldf("select * from sql", dbname = tempfile(),
file.format = list(header = FALSE, row.names = FALSE, colClasses = "factor",
comment.char = "", col.names ="raw"))

它适用于一个文件,但我无法像第一个代码片段那样更改代码以读入多个文件。有人能帮我吗?谢谢!沫沫

最佳答案

这似乎有效(但我认为有更快的 sql 方法)

sql.l <- lapply(filelist , file)

df_list2 <- lapply(sql.l, function(i) sqldf("select * from i" ,
dbname = tempfile(), file.format = list(header = TRUE, row.names = FALSE)))

查看速度 - 部分摘自 mnel 的帖子 Quickly reading very large tables as dataframes in R
library(data.table)
library(sqldf)

# test data
n=1e6
DT = data.table( a=sample(1:1000,n,replace=TRUE),
b=sample(1:1000,n,replace=TRUE),
c=rnorm(n),
d=sample(c("foo","bar","baz","qux","quux"),n,replace=TRUE),
e=rnorm(n),
f=sample(1:1000,n,replace=TRUE) )

# write 5 files out
lapply(1:5, function(i) write.table(DT,paste0("test", i, ".dat"),
sep=",",row.names=FALSE,quote=FALSE))

阅读: 数据表
filelist <- list.files(pattern = "*.dat")

system.time(df_list <- lapply(filelist, fread))

# user system elapsed
# 5.244 0.200 5.457

阅读: sqldf
sql.l <- lapply(filelist , file)

system.time(df_list2 <- lapply(sql.l, function(i) sqldf("select * from i" ,
dbname = tempfile(), file.format = list(header = TRUE, row.names = FALSE))))

# user system elapsed
# 35.594 1.432 37.357

检查 - 除属性外似乎没问题
all.equal(df_list , df_list2)

关于r 读入多个 .dat 文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23271323/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com