gpt4 book ai didi

r - stats::reshape 的替代品

转载 作者:行者123 更新时间:2023-12-02 17:49:15 25 4
gpt4 key购买 nike

reshape 包中的熔化/类型转换功能很棒,但我不确定当测量变量具有不同类型时是否有一种简单的方法来应用它们。例如,以下是数据片段,其中每个 MD 提供了三名患者的性别和体重:

ID PT1 WT1 PT2 WT2 PT3 WT3
1 "M" 170 "M" 175 "F" 145
...

目标是 reshape ,使每一行都是一个患者:

ID PTNUM GENDER WEIGHT
1 1 "M" 170
1 2 "M" 175
1 3 "F" 145
...

使用 stats 包中的 reshape 函数是我所知道的一种选择,但我在这里发帖是希望比我更有经验的 R 用户能够发布其他更好的方法。非常感谢!

--

@Vincent Zoonekynd:

我非常喜欢你的例子,所以我将它推广到多个变量。

# Sample data
n <- 5
d <- data.frame(
id = 1:n,
p1 = sample(c("M","F"),n,replace=TRUE),
q1 = sample(c("Alpha","Beta"),n,replace=TRUE),
w1 = round(runif(n,100,200)),
y1 = round(runif(n,100,200)),
p2 = sample(c("M","F"),n,replace=TRUE),
q2 = sample(c("Alpha","Beta"),n,replace=TRUE),
w2 = round(runif(n,100,200)),
y2 = round(runif(n,100,200)),
p3 = sample(c("M","F"),n,replace=TRUE),
q3 = sample(c("Alpha","Beta"),n,replace=TRUE),
w3 = round(runif(n,100,200)),
y3 = round(runif(n,100,200))
)
# Reshape the data.frame, one variable at a time
library(reshape)
d1 <- melt(d, id.vars="id", measure.vars=c("p1","p2","p3","q1","q2","q3"))
d2 <- melt(d, id.vars="id", measure.vars=c("w1","w2","w3","y1","y2","y3"))
d1 = cbind(d1,colsplit(d1$variable,names=c("var","ptnum")))
d2 = cbind(d2,colsplit(d2$variable,names=c("var","ptnum")))
d1$variable = NULL
d2$variable = NULL
d1c = cast(d1,...~var)
d2c = cast(d2,...~var)
# Join the two data.frames
d3 = merge(d1c, d2c, by=c("id","ptnum"), all=TRUE)

--

最后的想法:我提出这个问题的动机是了解除了 stats::reshape 函数之外的 reshape 包的替代方案。目前,我得出以下结论:

  • 尽可能坚持使用 stats::reshape。只要您记得使用列表而不是简单的向量来表示“变化”的参数,您就不会遇到麻烦。对于较小的数据集(我这次处理的是总共少于 200 个变量的几千个患者病例),该函数的较低速度值得代码的简单性。

  • 要使用 Hadley Wickham 的 reshape(或 reshape2)包中的类型转换/熔化方法,您必须将变量分成两组,一组由数字变量组成,另一组由字符变量组成。当您的数据集足够大,以至于您发现 stats::reshape 难以忍受时,我想将变量分为两组的额外步骤似乎不会那么糟糕。

最佳答案

您可以单独处理每个变量,并连接生成的两个 data.frames。

# Sample data
n <- 5
d <- data.frame(
id = 1:n,
pt1 = sample(c("M","F"),n,replace=TRUE),
wt1 = round(runif(n,100,200)),
pt2 = sample(c("M","F"),n,replace=TRUE),
wt2 = round(runif(n,100,200)),
pt3 = sample(c("M","F"),n,replace=TRUE),
wt3 = round(runif(n,100,200))
)
# Reshape the data.frame, one variable at a time
library(reshape2)
d1 <- melt(d,
id.vars="id", measure.vars=c("pt1","pt2","pt3"),
variable.name="patient", value.name="gender"
)
d2 <- melt(d,
id.vars="id", measure.vars=c("wt1","wt2","wt3"),
variable.name="patient", value.name="weight"
)
d1$patient <- as.numeric(gsub("pt", "", d1$patient))
d2$patient <- as.numeric(gsub("wt", "", d1$patient))
# Join the two data.frames
merge(d1, d2, by=c("id","patient"), all=TRUE)

关于r - stats::reshape 的替代品,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9341865/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com