gpt4 book ai didi

r - ddply 总结比例计数

转载 作者:行者123 更新时间:2023-12-04 18:09:30 25 4
gpt4 key购买 nike

我在使用 plyr 包中的 ddply 函数时遇到了一些问题。我试图用每个组内的计数和比例来总结以下数据。这是我的数据:

    structure(list(X5employf = structure(c(1L, 3L, 1L, 1L, 1L, 3L, 
1L, 1L, 1L, 3L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 1L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 1L, 1L, 3L, 1L,
3L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L,
3L, 3L, 1L), .Label = c("increase", "decrease", "same"), class = "factor"),
X5employff = structure(c(2L, 6L, NA, 2L, 4L, 6L, 5L, 2L,
2L, 8L, 2L, 2L, 2L, 7L, 7L, 8L, 11L, 7L, 2L, 8L, 8L, 11L,
7L, 6L, 2L, 5L, 2L, 8L, 7L, 7L, 7L, 8L, 6L, 7L, 5L, 5L, 7L,
2L, 6L, 7L, 2L, 2L, 2L, 2L, 2L, 5L, 5L, 5L, 2L, 5L, 2L, 2L,
2L, 5L, 12L, 2L, 2L, 2L, 2L, 5L, 5L, 5L, 5L, 2L, 5L, 2L,
13L, 9L, 9L, 9L, 7L, 8L, 5L), .Label = c("", "1", "1 and 8",
"2", "3", "4", "5", "6", "6 and 7", "6 and 7 ", "7", "8",
"1 and 8"), class = "factor")), .Names = c("X5employf", "X5employff"
), row.names = c(NA, 73L), class = "data.frame")

这是我使用 ddply 的调用:
ddply(kano_final, .(X5employf, X5employff), summarise, n=length(X5employff), prop=(n/sum(n))*100)

这给了我 X5employff 的每个实例的计数正确,但似乎比例是在每一行中计算的,而不是在因子的每个水平内计算 X5employf如下:
   X5employf X5employff  n prop
1 increase 1 26 100
2 increase 2 1 100
3 increase 3 15 100
4 increase 1 and 8 1 100
5 increase <NA> 1 100
6 decrease 4 1 100
7 decrease 5 5 100
8 decrease 6 2 100
9 decrease 7 1 100
10 decrease 8 1 100
11 same 4 4 100
12 same 5 6 100
13 same 6 5 100
14 same 6 and 7 3 100
15 same 7 1 100

当手动计算每个组内的比例时,我得到了这个:
   X5employf X5employff  n prop
1 increase 1 26 59.09
2 increase 2 1 2.27
3 increase 3 15 34.09
4 increase 1 and 8 1 2.27
5 increase <NA> 1 2.27
6 decrease 4 1 10.00
7 decrease 5 5 50.00
8 decrease 6 2 20.00
9 decrease 7 1 10.00
10 decrease 8 1 10.00
11 same 4 4 21.05
12 same 5 6 31.57
13 same 6 5 26.31
14 same 6 and 7 3 15.78
15 same 7 1 5.26

如您所见,因子 X5employf 的每个水平中的比例总和等于 100。

我知道这可能非常简单,但尽管阅读了各种类似的帖子,但我似乎无法理解它。任何人都可以帮助解决这个问题以及我对汇总功能如何工作的理解?!

非常感谢

马蒂

最佳答案

你不能一次做到ddply调用是因为传递给每个 summarize call 是组变量特定组合的数据子集。在此最低级别,您无权访问该中间级别 sum(n) .相反,分两步完成:

kano_final <- ddply(kano_final, .(X5employf), transform,
sum.n = length(X5employf))

ddply(kano_final, .(X5employf, X5employff), summarise,
n = length(X5employff), prop = n / sum.n[1] * 100)

编辑 :使用单个 ddply调用和使用 table正如你暗示的那样:
ddply(kano_final, .(X5employf), summarise,
n = Filter(function(x) x > 0, table(X5employff, useNA = "ifany")),
prop = 100* prop.table(n),
X5employff = names(n))

关于r - ddply 总结比例计数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18057081/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com