gpt4 book ai didi

r - 应用group_by和summarise(sum),但保留具有不相关冲突数据的列?

转载 作者:行者123 更新时间:2023-12-04 12:04:13 24 4
gpt4 key购买 nike

我的问题与Applying group_by and summarise on data while keeping all the columns' info非常相似
但我想保留被排除的列,因为它们在分组后会发生冲突。

Label <- c("203c","203c","204a","204a","204a","204a","204a","204a","204a","204a")
Type <- c("wholefish","flesh","flesh","fleshdelip","formula","formuladelip",
"formula","formuladelip","wholefish", "wholefishdelip")
Proportion <- c(1,1,0.67714,0.67714,0.32285,0.32285,0.32285,
0.32285, 0.67714,0.67714)
N <- (1:10)
C <- (1:10)
Code <- c("c","a","a","b","a","b","c","d","c","d")

df <- data.frame(Label,Type, Proportion, N, C, Code)
df

Label Type Proportion N C Code
1 203c wholefish 1.0000 1 1 c
2 203c flesh 1.0000 2 2 a
3 204a flesh 0.6771 3 3 a
4 204a fleshdelip 0.6771 4 4 b
5 204a formula 0.3228 5 5 a
6 204a formuladelip 0.3228 6 6 b
7 204a formula 0.3228 7 7 c
8 204a formuladelip 0.3228 8 8 d
9 204a wholefish 0.6771 9 9 c
10 204a wholefishdelip 0.6771 10 10 d

total <- df %>%
#where the Label and Code are the same the Proportion, N and C
#should be added together respectively
group_by(Label, Code) %>%
#total proportion should add up to 1
#my way of checking that the correct task has been completed
summarise_if(is.numeric, sum)

# A tibble: 6 x 5
# Groups: Label [?]
Label Code Proportion N C
<fctr> <fctr> <dbl> <int> <int>
1 203c a 1.00000 2 2
2 203c c 1.00000 1 1
3 204a a 0.99999 8 8
4 204a b 0.99999 10 10
5 204a c 0.99999 16 16
6 204a d 0.99999 18 18

直到这里,我得到了我想要的。现在,我想包括“类型”列,但由于值冲突而被排除在外。这是我想要获得的结果
# A tibble: 6 x 5
# Groups: Label [?]
Label Code Proportion N C Type
<fctr> <fctr> <dbl> <int> <int> <fctr>
1 203c a 1.00000 2 2 wholefish
2 203c c 1.00000 1 1 flesh
3 204a a 0.99999 8 8 flesh_formula
4 204a b 0.99999 10 10 fleshdelip_formuladelip
5 204a c 0.99999 16 16 wholefish_formula
6 204a d 0.99999 18 18 wholefishdelip_formuladelip

我已经尝试过 ungroup()以及 mutateunite的一些变体,但无济于事,任何建议将不胜感激

最佳答案

这是data.table解决方案,我假设您需要比例的mean(),因为这些分组的比例可能不是可加的。

setDT(df)

df[, .(Type =paste(Type,collapse="_"),
Proportion=mean(Proportion),N= sum(N),C=sum(C)), by=.(Label,Code)]
[order(Label)]

Label Code Type Proportion N C
1: 203c c wholefish 1.000000 1 1
2: 203c a flesh 1.000000 2 2
3: 204a a flesh_formula 0.499995 8 8
4: 204a b fleshdelip_formuladelip 0.499995 10 10
5: 204a c formula_wholefish 0.499995 16 16
6: 204a d formuladelip_wholefishdelip 0.499995 18 18

我不确定这是否是最干净的 dplyr解决方案,但它可以正常工作:
df %>% group_by(Label, Code) %>% 
mutate(Type = paste(Type,collapse="_")) %>%
group_by(Label,Type,Code) %>%
summarise(N=sum(N),C=sum(C),Proportion=mean(Proportion))

请注意,这里的关键是在创建组合的 Type列后重新分组。
   Label                        Type   Code     N     C Proportion
<fctr> <chr> <fctr> <int> <int> <dbl>
1 203c flesh a 2 2 1.000000
2 203c wholefish c 1 1 1.000000
3 204a flesh_formula a 8 8 0.499995
4 204a fleshdelip_formuladelip b 10 10 0.499995
5 204a formula_wholefish c 16 16 0.499995
6 204a formuladelip_wholefishdelip d 18 18 0.499995

关于r - 应用group_by和summarise(sum),但保留具有不相关冲突数据的列?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46553514/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com