gpt4 book ai didi

r - 有效扩展R中的大数据框

转载 作者:行者123 更新时间:2023-12-04 10:14:12 24 4
gpt4 key购买 nike

我正在寻找一种解决方案,根据给定列的值,将 R 中的大型数据框扩展为更多列和更多行。

现在我正在使用 for-loop 方法来做这件事,但我确信有更疯狂/更有效的方法来实现相同的结果......

我认为这个例子会让问题更清楚。假设我们有一个数据框,其中包含学生在人生三个不同阶段的成绩信息。学号为s1、s2、s3;我们测量了他们一生中三个不同时期的成绩,m1、m2 和 m3;然后在每个阶段我们都有一个名为 more.info 的列,其中包含他们在类(class)中的成绩,编码为 class#topic#grade 在所有已修课中。

library(stringr)
options(stringsAsFactors=FALSE)
example.df = data.frame(measure.id = c("m1", "m2", "m3", "m2", "m2", "m3", "m1", "m1", "m3"),
student.id = c("s1", "s1", "s1", "s2", "s3", "s3", "s2", "s3", "s2"),
more.info = c("draw#drawing#4.5;music#singing#5.6;dance#ballet#6.7", "bio#biology#5.6;math#algebra#4.5", "calculus#univariate#6.2; physics#quantum#4.5;chemistry#organic#4.5",
"bio#biology#5.6;math#algebra#4.5", "bio#biology#3.6;math#algebra#3.5", "calculus#univariate#5.2; physics#quantum#5.2;chemistry#organic#4", "draw#drawing#5;music#singing#5.6;dance#ballet#5.7",
"draw#drawing#2.5;music#singing#3.6;dance#ballet#4", "calculus#univariate#5.2; physics#quantum#6.5;chemistry#organic#5"))
measure.ids = unique(example.df$measure.id)

然后,我想创建一个新的数据框,将 more.info 信息拆分并创建一个具有更多行和更多列的新数据框,如下所示,

new.df=data.frame()
splitit <- function(x){
strsplit(x, '#')
}
for(i in 1:length(measure.ids)){
measure.id = measure.ids[i]
tmp = example.df[example.df==measure.id,]
more.info = tmp$more.info
more.info = strsplit(more.info,";")
student.ids = tmp$student.id
for(j in 1:length(more.info))
{
info = more.info[[j]]
a = sapply(info, splitit)
b = sapply(a, "[[", 1)
d = sapply(a, "[[", 2)
e = sapply(a, "[[", 3)
new.df = rbind(new.df,
data.frame(measure.id = rep(measure.id, length(info)),
student.id = rep(tmp$student.id[j], length(info)),
class = b,
topic = d,
grade = e)
)
}
}

在 R 中实现此目的的最有效方法是什么?我愿意应用函数、map/reduce 方法、mclapply 以使用更多内核等...

最佳答案

具有基本功能的解决方案:

# split column by all available separators 
a <- strsplit(example.df$more.info, "; |#|;")
# represent each result as a matrix with 3 columns
a <- lapply(a, function(v) matrix(v, ncol=3, byrow=TRUE))
# combine all matrixes in one big matrix
aa <- do.call(rbind, a)
# create indices of rows of initial data.frame which corresponds to the created big matrix
b <- unlist(sapply(seq_along(a), function(i) rep(i, nrow(a[[i]]))))
# combine initial data.frame and created big matrix
df <- cbind(example.df[b,], aa)
# remove unnecessary columns and rename remaining ones
df <- df[,-3]
colnames(df)[3:5] <- c("class", "topic", "grade")

为了提高速度,您可以将我代码中的所有 apply 系列函数替换为 mclapply

我无法比较速度,因为你的数据集非常小。

关于r - 有效扩展R中的大数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23668706/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com