gpt4 book ai didi

将宽格式 reshape 为多列长格式

转载 作者:行者123 更新时间:2023-12-03 20:00:14 25 4
gpt4 key购买 nike

我想 reshape 一个具有多个测试的宽格式数据集,这些测试在 3 个时间点进行测量:

   ID   Test Year   Fall Spring Winter
1 1 2008 15 16 19
1 1 2009 12 13 27
1 2 2008 22 22 24
1 2 2009 10 14 20
2 1 2008 12 13 25
2 1 2009 16 14 21
2 2 2008 13 11 29
2 2 2009 23 20 26
3 1 2008 11 12 22
3 1 2009 13 11 27
3 2 2008 17 12 23
3 2 2009 14 9 31

进入一个按列分隔测试但将测量时间转换为长格式的数据集,对于每个新列,如下所示:
    ID  Year    Time        Test1 Test2
1 2008 Fall 15 22
1 2008 Spring 16 22
1 2008 Winter 19 24
1 2009 Fall 12 10
1 2009 Spring 13 14
1 2009 Winter 27 20
2 2008 Fall 12 13
2 2008 Spring 13 11
2 2008 Winter 25 29
2 2009 Fall 16 23
2 2009 Spring 14 20
2 2009 Winter 21 26
3 2008 Fall 11 17
3 2008 Spring 12 12
3 2008 Winter 22 23
3 2009 Fall 13 14
3 2009 Spring 11 9
3 2009 Winter 27 31

我没有成功尝试使用 reshape 和融化。现有帖子地址转换为单列结果。

最佳答案

使用 reshape2 :

# Thanks to Ista for helping with direct naming using "variable.name"
df.m <- melt(df, id.var = c("ID", "Test", "Year"), variable.name = "Time")
df.m <- transform(df.m, Test = paste0("Test", Test))
dcast(df.m, ID + Year + Time ~ Test, value.var = "value")

更新:使用版本 >= 1.9.0 的 data.table 熔体/类型转换:
data.table从版本 1.9.0 进口 reshape2快速打包和实现 meltdcast用于 data.tables 的 C 中的方法。更大数据上的速度比较如下所示。

有关新闻的更多信息,请转到 here .
require(data.table) ## ver. >=1.9.0
require(reshape2)

dt <- as.data.table(df, key=c("ID", "Test", "Year"))
dt.m <- melt(dt, id.var = c("ID", "Test", "Year"), variable.name = "Time")
dt.m[, Test := paste0("Test", Test)]
dcast.data.table(dt.m, ID + Year + Time ~ Test, value.var = "value")

目前,您必须写 dcast.data.table明确地因为它不是 reshape2 中的 S3 泛型然而。

对更大数据进行基准测试:
# generate data:
set.seed(45L)
DT <- data.table(ID = sample(1e2, 1e7, TRUE),
Test = sample(1e3, 1e7, TRUE),
Year = sample(2008:2014, 1e7,TRUE),
Fall = sample(50, 1e7, TRUE),
Spring = sample(50, 1e7,TRUE),
Winter = sample(50, 1e7, TRUE))
DF <- as.data.frame(DT)

reshape2 时间:
reshape2_melt <- function(df) {
df.m <- melt(df, id.var = c("ID", "Test", "Year"), variable.name = "Time")
}
# min. of three consecutive runs
system.time(df.m <- reshape2_melt(DF))
# user system elapsed
# 43.319 4.909 48.932

df.m <- transform(df.m, Test = paste0("Test", Test))

reshape2_cast <- function(df) {
dcast(df.m, ID + Year + Time ~ Test, value.var = "value")
}
# min. of three consecutive runs
system.time(reshape2_cast(df.m))
# user system elapsed
# 57.728 9.712 69.573

data.table 时间:
DT_melt <- function(dt) {
dt.m <- melt(dt, id.var = c("ID", "Test", "Year"), variable.name = "Time")
}
# min. of three consecutive runs
system.time(dt.m <- reshape2_melt(DT))
# user system elapsed
# 0.276 0.001 0.279

dt.m[, Test := paste0("Test", Test)]

DT_cast <- function(dt) {
dcast.data.table(dt.m, ID + Year + Time ~ Test, value.var = "value")
}
# min. of three consecutive runs
system.time(DT_cast(dt.m))
# user system elapsed
# 12.732 0.825 14.006
melt.data.table约快 175 倍 reshape2:::meltdcast.data.table ~5x reshape2:::dcast .

关于将宽格式 reshape 为多列长格式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15668870/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com