gpt4 book ai didi

r - 使用输入向量 SparkR 按多列分组

转载 作者:行者123 更新时间:2023-12-04 16:06:26 26 4
gpt4 key购买 nike

我正在使用 SparkR 2.1.0 进行数据操作

我想以编程方式按多列分组。我知道如果我单独列出它们,或者从向量中引用它们的位置,我可以按多列分组......但我希望能够将列列表作为向量传递(这样,函数会自动调整为我传递给它的参数数量)

虚拟数据:

 cpny <- c("Fakeco1", "Fakeco2", "Fakeco3", "Fakeco4", "Fakeco5", "Fakeco6")
state <- c("CA", "NY", "WA", "CA", "CA", "NY")
public <- c("Y", "Y", "N", "N", "N", "N")
color <- c("White", "Red", "Green", "Green", "Green", "Red")
revs <- c(400, 200, 900, 500, 200, 120)
df <- data.frame(cpny, state, public, color, revs)
# Convert to SparkR dataframe
df_s <- as.DataFrame(df)

作品:

  df_grouped <- df_s %>%
groupBy('state', 'public') %>%
summarize(sum_Revs = sum(df_s$revs))

也有效:

  group_vars <- c('state', 'public')

df_grouped <- df_s %>%
groupBy(group_vars[[1]], group_vars[[2]]) %>%
summarize(sum_Revs = sum(df_s$revs))

不起作用:

  group_vars <- c('state', 'public')

df_grouped <- df_s %>%
groupBy(group_vars) %>%
summarize(sum_Revs = sum(df_s$revs))

任何解决方案或替代想法?

最佳答案

您可以使用 do.call() https://stat.ethz.ch/R-manual/R-devel/library/base/html/do.call.html并将您的列和数据框放入列表中。以下对我有用:

cpny <- c("Fakeco1", "Fakeco2", "Fakeco3", "Fakeco4", "Fakeco5", "Fakeco6")
state <- c("CA", "NY", "WA", "CA", "CA", "NY")
public <- c("Y", "Y", "N", "N", "N", "N")
color <- c("White", "Red", "Green", "Green", "Green", "Red")
revs <- c(400, 200, 900, 500, 200, 120)
df <- data.frame(cpny, state, public, color, revs)
# Convert to SparkR dataframe
df_s <- as.DataFrame(df)

group_vars <- c('state', 'public')


function_params <- list(df_s)
for (i in range(1:length(group_vars))) {
function_params[[i+1]] <- group_vars[i]
}

summarized<- do.call(SparkR::groupBy, function_params) %>% SparkR::summarize(sum_Revs = sum(df_s$revs))
SparkR::head(summarized)

关于r - 使用输入向量 SparkR 按多列分组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48570026/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com