gpt4 book ai didi

将长整形为宽并保留重复的行

转载 作者:行者123 更新时间:2023-12-04 12:10:01 25 4
gpt4 key购买 nike

对于给定的数据集,我想将我的数据集从长格式转换为宽格式。我已经使用 reshape 功能来这样做。

id  status      timestamp   
1 assigned 2017-01-02
1 done 2017-01-03
1 locked 2017-01-04
2 assigned 2017-01-02
2 done 2017-01-03
2 assigned 2017-01-03
2 done 2017-01-04
2 locked 2017-01-05
3 assigned 2017-01-02
3 done 2017-01-03
3 locked 2017-01-04
...

# reshape function to convert long format to Wide.
temp <- reshape(temp, idvar = "id", timevar = "status", direction = "wide")

结果:

id timestamp.assigned timestamp.done timestamp.locked
1 2017-01-02 2017-01-03 2017-01-04
2 2017-01-02 2017-01-03 2017-01-05
3 2017-01-02 2017-01-03 2017-01-04

当我这样做时,它会删除一些行,例如:对于 id 2,有多个行匹配 status=assigned,它会占用第一行。

如何在不删除行的情况下转换为宽。基本上,我不想丢失任何数据。

预期结果:
id timestamp.assigned timestamp.done timestamp.locked
1 2017-01-02 2017-01-03 2017-01-04
2 2017-01-02 2017-01-03 2017-01-05
2 2017-01-03 2017-01-04 2017-01-05
3 2017-01-02 2017-01-03 2017-01-04

id timestamp.assigned timestamp.done timestamp.locked
1 2017-01-02 2017-01-03 2017-01-04
2 2017-01-02 2017-01-03 NA
2 2017-01-03 2017-01-04 2017-01-05
3 2017-01-02 2017-01-03 2017-01-04

最佳答案

1。 cumsum()

Esther's approach给每个新作业编号是可行的方法。

但是,R 已经有了 cumsum()可用于此目的的函数:

temp$key <- cumsum(temp$status == "assigned")
reshape(temp, idvar = c("key", "id"), timevar = "status", direction = "wide")
   id key timestamp.assigned timestamp.done timestamp.locked
1: 1 1 2017-01-02 2017-01-03 2017-01-04
2: 2 2 2017-01-02 2017-01-03 <NA>
3: 2 3 2017-01-03 2017-01-04 2017-01-05
4: 3 4 2017-01-02 2017-01-03 2017-01-04

2。分组cumsum()

虽然这解决了 OP 的原始问题,key只是所有 分配给所有 的数字id秒。如果 OP 希望为每个作业单独编号 id我们需要申请cumsum()id 分组.

完成此操作的一种方法是使用 data.table语法:

library(data.table)
setDT(temp)[, key := cumsum(status == "assigned"), by = id]
dcast(temp, id + key ~ status, value.var = "timestamp")
   id key   assigned       done     locked
1: 1 1 2017-01-02 2017-01-03 2017-01-04
2: 2 1 2017-01-02 2017-01-03 <NA>
3: 2 2 2017-01-03 2017-01-04 2017-01-05
4: 3 1 2017-01-02 2017-01-03 2017-01-04

dcast()是基础 R 的 reshape(..., direction = "wide") 的替代品可从 reshape2 获得的功能和 data.table包。

3。分组cumsum() 即时

data.table的公式界面的 dcast()也接受表达式。有了这个,就没有必要修改 temp通过附加 key reshape 之前列。相反,这可以在 reshape 时即时完成:

dcast(temp, id + ave(key <- status == "assigned", id, FUN = cumsum) ~ 
paste0("timestamp.", status))
   id key timestamp.assigned timestamp.done timestamp.locked
1: 1 1 2017-01-02 2017-01-03 2017-01-04
2: 2 1 2017-01-02 2017-01-03 <NA>
3: 2 2 2017-01-03 2017-01-04 2017-01-05
4: 3 1 2017-01-02 2017-01-03 2017-01-04

数据

library(data.table)
temp <- fread(
"id status timestamp
1 assigned 2017-01-02
1 done 2017-01-03
1 locked 2017-01-04
2 assigned 2017-01-02
2 done 2017-01-03
2 assigned 2017-01-03
2 done 2017-01-04
2 locked 2017-01-05
3 assigned 2017-01-02
3 done 2017-01-03
3 locked 2017-01-04 ")

关于将长整形为宽并保留重复的行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50676344/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com