gpt4 book ai didi

r - 复杂的 reshape

转载 作者:行者123 更新时间:2023-12-02 17:50:09 24 4
gpt4 key购买 nike

我想将我的数据框从长格式 reshape 为宽格式,但我丢失了一些我想保留的数据。对于以下示例:

df <- data.frame(Par1 = unlist(strsplit("AABBCCC","")),
Par2 = unlist(strsplit("DDEEFFF","")),
ParD = unlist(strsplit("foo,bar,baz,qux,bla,xyz,meh",",")),
Type = unlist(strsplit("pre,post,pre,post,pre,post,post",",")),
Val = c(10,20,30,40,50,60,70))

# Par1 Par2 ParD Type Val
# 1 A D foo pre 10
# 2 A D bar post 20
# 3 B E baz pre 30
# 4 B E qux post 40
# 5 C F bla pre 50
# 6 C F xyz post 60
# 7 C F meh post 70

dfw <- dcast(df,
formula = Par1 + Par2 ~ Type,
value.var = "Val",
fun.aggregate = mean)

# Par1 Par2 post pre
# 1 A D 20 10
# 2 B E 40 30
# 3 C F 65 50

这几乎是我需要的,但我想要拥有

  1. 某些字段保存来自 ParD 字段的数据(例如,作为单个合并字符串),
  2. 用于聚合的观测值数量。

即我希望生成的 data.frame 如下:

    #     Par1 Par2 post pre Num.pre Num.post ParD
# 1 A D 20 10 1 1 foo_bar
# 2 B E 40 30 1 1 baz_qux
# 3 C F 65 50 1 2 bla_xyz_meh

如果有任何想法,我将不胜感激。例如,我尝试通过写入 dcast 来解决第二个任务: fun.aggregate=function(x) c(Val=mean(x),Num=length(x)) - 但这会导致一个错误。

最佳答案

迟到了,但这里有另一个使用 data.table 的替代方案:

require(data.table)
dt <- data.table(df, key=c("Par1", "Par2"))
dt[, list(pre=mean(Val[Type == "pre"]),
post=mean(Val[Type == "post"]),
pre.num=length(Val[Type == "pre"]),
post.num=length(Val[Type == "post"]),
ParD = paste(ParD, collapse="_")),
by=list(Par1, Par2)]

# Par1 Par2 pre post pre.num post.num ParD
# 1: A D 10 20 1 1 foo_bar
# 2: B E 30 40 1 1 baz_qux
# 3: C F 50 65 1 2 bla_xyz_meh
<小时/>

[来自 Matthew] +1 一些小的改进,以节省重复相同的 ==,并演示 j 内的局部变量。

dt[, list(pre=mean(Val[.pre <- Type=="pre"]),     # save .pre
post=mean(Val[.post <- Type=="post"]), # save .post
pre.num=sum(.pre), # reuse .pre
post.num=sum(.post), # reuse .post
ParD = paste(ParD, collapse="_")),
by=list(Par1, Par2)]

# Par1 Par2 pre post pre.num post.num ParD
# 1: A D 10 20 1 1 foo_bar
# 2: B E 30 40 1 1 baz_qux
# 3: C F 50 65 1 2 bla_xyz_meh

dt[, { .pre <- Type=="pre" # or save .pre and .post up front
.post <- Type=="post"
list(pre=mean(Val[.pre]),
post=mean(Val[.post]),
pre.num=sum(.pre),
post.num=sum(.post),
ParD = paste(ParD, collapse="_")) }
, by=list(Par1, Par2)]

# Par1 Par2 pre post pre.num post.num ParD
# 1: A D 10 20 1 1 foo_bar
# 2: B E 30 40 1 1 baz_qux
# 3: C F 50 65 1 2 bla_xyz_meh

如果 list 列比 paste 更合适,那么这应该会更快:

dt[, { .pre <- Type=="pre"
.post <- Type=="post"
list(pre=mean(Val[.pre]),
post=mean(Val[.post]),
pre.num=sum(.pre),
post.num=sum(.post),
ParD = list(ParD)) } # list() faster than paste()
, by=list(Par1, Par2)]

# Par1 Par2 pre post pre.num post.num ParD
# 1: A D 10 20 1 1 foo,bar
# 2: B E 30 40 1 1 baz,qux
# 3: C F 50 65 1 2 bla,xyz,meh

关于r - 复杂的 reshape ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15182888/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com