gpt4 book ai didi

r - 按组汇总 wtd.quantile

转载 作者:行者123 更新时间:2023-12-04 10:12:52 25 4
gpt4 key购买 nike

我想使用 Hmisc::wtd.quantile 为具有许多重复日期的数据框创建一个新的 df。我按日期分组,使用 summarize()按日期聚合,并尝试使用 wtd.quantile()在每个日期(带重量)。这也是一个相当大的数据集。下面是一些示例代码:

# sample data
# grouping_var = dt_time
require(Hmisc)
require(plyr)
require(dplyr)
df <- data.frame(type = sample(letters[1:2], 10e6, replace = TRUE),
score = sample(500:899, 10e6, replace = TRUE),
dt_time = sample(seq(as.Date('2010/01/01'),
as.Date('2018/01/01'),
by="day"), 10e6, replace = TRUE),
weight = sample(1.0:2.0, 10e6, replace = TRUE))
# my attempt:
ptiles <- df %>%
group_by(dt_time) %>%
plyr::ddply(~dt_time, dplyr::summarize,
ptile10 = Hmisc::wtd.quantile(., .$score, weights = .$weight,
probs = .1, na.rm = TRUE),
ptile50 = Hmisc::wtd.quantile(., .$score, weights = .$weight,
probs = .5, na.rm = TRUE),
ptile90 = Hmisc::wtd.quantile(., .$score, weights = .$weight,
probs = .9, na.rm = TRUE))

# desired df,
# where each new variable would be created using the
# wtd.quantile function:
desired_ptiles <- data.frame(dt_time = seq(as.Date('2010/01/01'),
as.Date('2010/01/06'),
by = "day"),
# only 6 because lol 10e6
ptile10 = sample(500:899, 6, replace = TRUE),
ptile50 = sample(500:899, 6, replace = TRUE),
ptile90 = sample(500:899, 6, replace = TRUE))

到目前为止,我的努力导致了这个错误:
Error in summarise_impl(.data, dots) :
Evaluation error: 'arg' must be NULL or a character vector.

当使用公式符号时:
ptiles <- df %>%
group_by(dt_time) %>%
plyr::ddply(~dt_time, dplyr::summarize,
ptile10 = Hmisc::wtd.quantile(., .$score, weights = .$weight,
probs = .1, na.rm = TRUE),
ptile50 = Hmisc::wtd.quantile(., .$score, weights = .$weight,
probs = .5, na.rm = TRUE),
ptile90 = Hmisc::wtd.quantile(., .$score, weights = .$weight,
probs = .9, na.rm = TRUE))
# error message:
Error in summarise_impl(.data, dots) :
Evaluation error: 'arg' must be NULL or a character vector.

我是否以错误的方式接近这个?我见过使用 split() 的方法但这似乎很烦人。有没有 data.table允许的方法 wtd.quantile()以这种方式概括?

谢谢!

最佳答案

你不需要ddply使用时 group_by ,因为数据已经被分组分割了。此外,您不需要在 summarize 中定义数据。分组后。

这有效:

ptiles <- df %>%
group_by(dt_time) %>%
summarize(ptile10 = wtd.quantile(score, weights = weight,
probs = .1, na.rm = TRUE),
ptile50 = wtd.quantile(score, weights = weight,
probs = .5, na.rm = TRUE),
ptile90 = wtd.quantile(score, weights = weight,
probs = .9, na.rm = TRUE))

> ptiles
# A tibble: 2,923 x 4
dt_time ptile10 ptile50 ptile90
<date> <dbl> <dbl> <dbl>
1 2010-01-01 539.0 697 859.0
2 2010-01-02 538.0 704 861.7
3 2010-01-03 541.0 706 862.0
4 2010-01-04 541.0 702 859.0
5 2010-01-05 540.0 706 860.0
6 2010-01-06 537.0 695 859.0
7 2010-01-07 539.0 696 859.0
8 2010-01-08 536.0 700 857.0
9 2010-01-09 538.0 694 861.0
10 2010-01-10 538.4 701 859.0
# ... with 2,913 more rows

关于r - 按组汇总 wtd.quantile,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52418838/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com