gpt4 book ai didi

r - 在数据框中的变量中查找 n% 的记录

转载 作者:行者123 更新时间:2023-12-03 12:10:32 25 4
gpt4 key购买 nike

我在数据框中有数据,第一列是日期,第二列是个体体重。以下是数据示例:

df <- data.frame(
date = c("2019-01-01", "2019-01-01", "2019-01-01", "2019-01-01",
"2019-01-01", "2019-01-01", "2019-01-01", "2019-01-01",
"2019-01-01", "2019-01-01", "2019-01-02", "2019-01-02", "2019-01-02",
"2019-01-02", "2019-01-02", "2019-01-02", "2019-01-02",
"2019-01-02", "2019-01-02", "2019-01-02"),
weight = c(2174.8, 2174.8, 2174.8, 8896.53, 8896.53, 2133.51, 2133.51,
2892.32, 2892.32, 2892.32, 2892.32, 5287.78, 5287.78, 6674.03,
6674.03, 6674.03, 6674.03, 6674.03, 5535.11, 5535.11)
)

我想先对每个日期运行简单的汇总统计,然后找到权重在给定范围内的记录数,按权重总范围的百分比定义类别。最后将每条记录的编号存储在单独的列中

Lowest 10%
10-20%
20-40%
40-60%
60-80%
80-90%
90-100%

The logic = (MinWeight + (MaxWeight-MinWeight)*X%)

这是我的预期结果(我只显示两列百分比范围)

df %>% 
group_by(date) %>%
summarise(mean(weight), min(weight), max(weight))
   date       `mean(weight)` `min(weight)` `max(weight)` `Lowest 10%` `10-20%`
2019-01-01 3726. 2134. 8897. num records. num records.

最佳答案

检查这个解决方案:

library(tidyverse)
library(wrapr)

df %>%
group_by(date) %>%
mutate(
rn = row_number(),
temp = weight - min(weight),
temp = (temp / max(temp)) * 100,
temp = cut(temp, seq(0, 100, 10), include.lowest = TRUE),
temp = str_remove(temp, '\\(|\\[') %>%
str_replace(',', '-') %>%
str_replace('\\]', '%'),
one = 1
) %>%
spread(temp, one, fill = 0) %.>%
left_join(
summarise(.,
`mean(weight)` = mean(weight),
`min(weight)` = min(weight),
`max(weight)` = max(weight)
),
summarise_at(., vars(matches('\\d+-\\d+.')), sum)
)

输出:

   date       `mean(weight)` `min(weight)` `max(weight)` `0-10%` `10-20%` `60-70%` `90-100%`
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2019-01-01 3726. 2134. 8897. 5 3 0 2
2 2019-01-02 5791. 2892. 6674. 1 0 4 5

关于r - 在数据框中的变量中查找 n% 的记录,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54722588/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com