gpt4 book ai didi

r - "select A, B, max(C) from D group by C"的 dplyr 习语

转载 作者:行者123 更新时间:2023-12-02 07:25:11 25 4
gpt4 key购买 nike

我正在为具有多个结果列的查询寻找 SQL 组的 dplyr 习惯用法。例如:

library(dplyr)
library(sqldf)

df <- data.frame(
fuel=rep(c("Coal", "Gas"), each=3),
year=rep(c(1998,1999,2000), 2),
percent=c(20,30,40,80,70,60))

sqldf("select fuel, year, max(percent) from df group by fuel")

fuel year max(percent)
1 Coal 2000 40
2 Gas 1998 80

sqldf 提供给定燃料达到其最大百分比的年份(忽略关系)。使用 dplyr 执行此操作的最佳方法是什么?简单地做:

group_by(df,fuel) %>% summarise(max(percent))

给出:

  fuel max(percent)
1 Coal 40
2 Gas 80

并且似乎没有地方可以添加额外的结果列。我可以使用 mutate 间接地做到这一点:

group_by(df,fuel) %>% mutate(maxp=max(percent)) %>% 
filter(percent==maxp) %>% select(-percent)

这是最好/唯一的方法吗?

最佳答案

更多选项

使用distinct(这类似于slice(which.max(percent)),但会避免分组操作,因此可能更有效)

df %>% 
arrange(desc(percent)) %>%
distinct(fuel)

# fuel year percent
# 1 Gas 1998 80
# 2 Coal 2000 40

或者使用过滤器(这将选择所有具有最大值的行)

df %>% 
group_by(fuel) %>%
filter(percent == max(percent))
# Source: local data frame [2 x 3]
# Groups: fuel [2]
#
# fuel year percent
# (fctr) (dbl) (dbl)
# 1 Coal 2000 40
# 2 Gas 1998 80

或者使用top_n(类似于filter(percent == max(percent))的结果)

df %>% 
group_by(fuel) %>%
top_n(n = 1, percent) # If percent is always the last column, you can just do top_n(n = 1)

# Source: local data frame [2 x 3]
# Groups: fuel [2]
#
# fuel year percent
# (fctr) (dbl) (dbl)
# 1 Coal 2000 40
# 2 Gas 1998 80

或者使用summariseleft_join(与上面两个类似的结果)

df %>% 
group_by(fuel) %>%
summarise(percent = max(percent)) %>%
left_join(., df)

# Joining by: c("fuel", "percent")
# Source: local data frame [2 x 3]
#
# fuel percent year
# (fctr) (dbl) (dbl)
# 1 Coal 40 2000
# 2 Gas 80 1998

关于r - "select A, B, max(C) from D group by C"的 dplyr 习语,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33344592/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com