作者热门文章
- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我有一个数据操作和排除挑战,我只是不知道如何成功解决。我有整齐格式的数据,所有的观察都是行。这是我的数据集的reprex:
quarter <- c("Q4", "Q3", "Q2","Q1", "Q3", "Q2", "Q1","Q4", "Q2", "Q1", "Q4", "Q3", "Q2", "Q1","Q4", "Q3", "Q1")
year <- c("2020", "2020","2020","2020","2019","2019","2019", "2020", "2020","2020","2019","2019","2019","2019", "2020", "2020","2020")
country <- c("Brazil","Brazil","Brazil","Brazil","Brazil","Brazil","Brazil","Brazil","Brazil","Brazil", "Brazil","Brazil","Brazil","Brazil","France","France","France")
indicator <- c("Testing","Testing", "Testing","Testing","Testing","Testing","Testing","TestingPos","TestingPos","TestingPos","TestingPos","TestingPos","TestingPos","TestingPos", "Testing","Testing","Testing")
value <- sample(c(1:10), 17, replace = T)
quarterlydf <- data.frame(quarter, year, country, indicator, value)
quarter year country indicator value
1 Q4 2020 Brazil Testing 9
2 Q3 2020 Brazil Testing 3
3 Q2 2020 Brazil Testing 2
4 Q1 2020 Brazil Testing 7
5 Q3 2019 Brazil Testing 1
6 Q2 2019 Brazil Testing 5
7 Q1 2019 Brazil Testing 6
8 Q4 2020 Brazil TestingPos 4
9 Q2 2020 Brazil TestingPos 4
10 Q1 2020 Brazil TestingPos 3
11 Q4 2019 Brazil TestingPos 7
12 Q3 2019 Brazil TestingPos 2
13 Q2 2019 Brazil TestingPos 8
14 Q1 2019 Brazil TestingPos 1
15 Q4 2020 France Testing 1
16 Q3 2020 France Testing 1
17 Q1 2020 France Testing 8
对于每个国家和指标组合,我需要找到最近的连续 4 个季度期间。对于最近的四个连续季度(例如 2019 年第 3 季度、2019 年第 4 季度、2020 年第 1 季度、2020 年第 2 季度),我需要在新数据框(此处为annualdf)中创建一个新行,其中包含国家、开始和结束季度/年度、指标、所包含季度值的总和和平均值。
start end country indicator sum mean
1 Q1_2020 Q4_2020 Brazil Testing 21 5.25
2 Q3_2019 Q2_2020 Brazil TestingPos 16 4
我不会介绍我尝试过的所有内容,但它变得非常难看,包括尝试将顺序 ID 重新分配给每个可能的季度/年组合,然后使用 pivot_wider() 为每个 ID 创建多个列,将这些列连接成单个结果,然后使用一组奇怪的 str_detect() 搜索来搜索和分配值。长话短说,我认为我正在尝试的整个方法非常糟糕且非常不优雅。
最佳答案
虽然语法有点长(我会尝试更短的)但这会起作用。这里唯一的假设是没有年份是完全缺失的,否则该字段也需要在 complete
之前完成。 .否则这些会起作用
quarterlydf %>%
arrange(desc(year, quarter)) %>%
group_by(country, indicator, year) %>%
complete(quarter = rev(c("Q1", "Q2", "Q3", "Q4"))) %>%
group_by(country, indicator) %>%
arrange(desc(year), desc(quarter), .by_group = T) %>%
filter(with(rle(is.na(value)), rep(lengths, lengths)) >=4, !is.na(value)) %>%
slice_head(n = 4) %>%
summarise(start = paste0(last(year),"_", last(quarter)),
end = paste0(first(year),"_", first(quarter)),
sum = sum(value),
mean = mean(value))
# A tibble: 2 x 6
# Groups: country [1]
country indicator start end sum mean
<chr> <chr> <chr> <chr> <int> <dbl>
1 Brazil Testing 2020_Q1 2020_Q4 18 4.5
2 Brazil TestingPos 2019_Q3 2020_Q2 16 4
也可以颠倒(按时间顺序)
quarterlydf %>%
arrange(year, quarter) %>%
group_by(country, indicator, year) %>%
complete(quarter = c("Q1", "Q2", "Q3", "Q4")) %>%
group_by(country, indicator) %>%
filter(with(rle(is.na(value)), rep(lengths, lengths)) >=4, !is.na(value)) %>%
slice_tail(n = 4) %>%
summarise(start = paste0(first(year),"_", first(quarter)),
end = paste0(last(year),"_", last(quarter)),
sum = sum(value),
mean = mean(value))
关于r - 如何使用 Tidyverse 在 R 中聚合凌乱的季度数据,搜索第一组连续的四个季度,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66987891/
我是一名优秀的程序员,十分优秀!