gpt4 book ai didi

r - 基于不同列中的两个因子水平的行值差异的 Dplyr 解决方案

转载 作者:行者123 更新时间:2023-12-04 23:35:20 25 4
gpt4 key购买 nike

我正在尝试使用 dplyr 根据大数据框中的因子级别计算两行值之间的差异。实际上,我想要每个国家/地区内每个政党的两个群体之间的投票距离。对于下面的数据,我想最终得到一个数据框,其中的行表示每个国家级别内每个政党级别的每个组对的投票值之间的差异。滞后函数似乎不适用于我的数据,因为因子水平的数量因国家/地区而异,这意味着每个国家/地区的团体和政党总数不同。下面是设置的一个小示例。

df1 <- data.frame(id = c(1:12),
country = c("a","a","a","a","a","a","b","b","b","b","b","b"),
group = c("x","y","z","x","y","z","x","y","z","x","y","z"),
party = c("d","d","d","e","e","e","d","d","d","e","e","e"),
vote = c(.15,.02,.7, .5, .6, .22,.47,.33,.09,.83,.77,.66))

这就是我希望最终产品的外观。
df2 <- data.frame(id= c(1:12),
country = c("a","a","a","b","b","b","a","a","a","b","b","b"),
group1 = c("x","x","y","x","x","y","x","x","y","x","x","y"),
group2 = c("y","z","z","y","z","z","y","z","z","y","z","z"),
party = c("d","d","d","d","d","d","e","e","e","e","e","e"),
dist = c(.13,-.5,-.68,.14,.38,.24,-.1,.28,.38,.06,.17,.11))


我以前尝试过 dcast,如果我填写了我想要的列,它不会排列并在应该有值的地方产生 NA 或 0。滞后功能在我的情况下不起作用,因为每个国家/地区的政党和团体数量都是唯一的,而不是固定的。每当我尝试不同的滞后时间间隔时,在某些情况下,这些值都会在跨党派的国家而不是跨组进行比较。

我在 dplyr 之外找到了解决方案,但为了简化呈现代码,我想知道 dplyr 是否有办法。此外,我拥有的代码非常长和笨拙,并且仅针对此问题使用了六七个包。

谢谢

最佳答案

我们可以使用 combn创造差异

library(dplyr)
df1 %>%
group_by(country, party) %>%
mutate(dist = combn(vote, 2, FUN = function(x) x[1] - x[2]))

关于r - 基于不同列中的两个因子水平的行值差异的 Dplyr 解决方案,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59458452/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com