gpt4 book ai didi

r - 如何对字符串变量使用 cut 函数?

转载 作者:行者123 更新时间:2023-12-01 12:02:20 24 4
gpt4 key购买 nike

我一直在处理这样的数据集:

df <- tribble(
~id, ~price, ~day,
"1", 10, '3',
"1", 5, '1',
"2", 7, '4',
"2", 6, '2',
"2", 3, '4',
"3", 4, '1',
"4", 5, '1',
"4", 6, '1',
"5", 1, '2',
"5", 9, '3',
)

然而,真实数据在 day 中有将近 50 个唯一值。对于分析,我想查看每个 day 和每个 id 的中位数价格。这是所需的数据(值不正确):

df <- tribble(
~id, ~day_1, ~day_2, ~day_3, ~day_4,
"1", 1, 1, 1, 1,
"2", 1, 1, 1, 1,
"3", 1, 1, 1, 1,
"4", 1, 1, 1, 1,
"5", 1, 1, 1, 1,
)

为此,我尝试按照如下所示对其进行编码。但是,我无法剪切 day 变量,因为它是字符串变量。

df %>% 
mutate(date_day = cut(day)) %>%
select(-day) %>%
pivot_wider(names_from = date_day, values_from = median(price)) %>%
adorn_percentages()

有什么方法可以做到?谢谢!

最佳答案

data.table中,我们可以使用dcast并指定fun.aggregate来得到median “价格”的值(value)

library(data.table)
dcast(setDT(df), id ~ paste0('day_', day), value.var = 'price', median)
# id day_1 day_2 day_3 day_4
#1: 1 5.0 NA 10 NA
#2: 2 NA 6 NA 5
#3: 3 4.0 NA NA NA
#4: 4 5.5 NA NA NA
#5: 5 NA 1 9 NA

有了pivot_widervalues_fndcast中做了类似的选择,所以我们可以直接使用它

library(tidyr)
library(stringr)
df %>%
pivot_wider(id_cols = id, names_from = day, values_from = price,
values_fn =list(price = median),
names_repair = ~ c('id', str_c('day', .[-1])))
# A tibble: 5 x 5
# id day3 day1 day4 day2
# <chr> <dbl> <dbl> <dbl> <dbl>
#1 1 10 5 NA NA
#2 2 NA NA 5 6
#3 3 NA 4 NA NA
#4 4 NA 5.5 NA NA
#5 5 9 NA NA 1

使用 pivot 函数,列按照值的出现顺序排序,否则必须在进行旋转之前重新排序

或者在pivot_wider之后使用rename_at

df %>%
pivot_wider(id_cols = id, names_from = day, values_from = price,
values_fn =list(price = median)) %>%
rename_at(-1, ~ str_c('day_', .))
# A tibble: 5 x 5
# id day_3 day_1 day_4 day_2
# <chr> <dbl> <dbl> <dbl> <dbl>
#1 1 10 5 NA NA
#2 2 NA NA 5 6
#3 3 NA 4 NA NA
#4 4 NA 5.5 NA NA
#5 5 9 NA NA 1

关于r - 如何对字符串变量使用 cut 函数?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60662724/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com