gpt4 book ai didi

r - 从 R 数据帧中提取逗号分隔值

转载 作者:行者123 更新时间:2023-12-04 15:32:09 24 4
gpt4 key购买 nike

以下数据框中的一列具有逗号分隔值:

数据框 1:

Id        date                   price     batch          resp
uv-1 2020-01-10 15:13:16 1000 Q ES,RT,AL
uv-2 2020-01-11 17:13:16 5000 W ES,AL
uv-3 2020-01-12 18:13:16 2000 E ES,RT
uv-4 2020-01-13 12:13:16 3000 R ES,RT
uv-5 2020-01-14 13:13:16 1600 T RT,AL
uv-6 2020-01-15 13:13:16 1600 T ES,AL
uv-7 2020-01-17 11:13:16 1300 Y ES,RT,AL

我需要按月提取 resp 值的计数,如下所示。

                   Jan-20
batch ES RT AL Total
Q 1 1 1 1
% 100% 100% 100% 14.29%
W 1 0 1 1
% 100% 0.00% 100% 14.29%
E 1 1 0 1
% 100% 100% 0.00% 14.29%
R 1 1 0 1
% 100% 100% 0.00% 14.29%
T 1 1 2 2
% 50% 50% 100% 28.57%
Y 1 1 1 1
% 100% 100% 100% 14.29%
Total 6 5 5 7
Total(%) 85.71% 71.43% 71.43% 100%

最佳答案

使用 dplyr、tidyr 和 lubridate,我们可以创建两个汇总数据框,一个用于批处理,一个用于总计,然后使用 bind_rows 组合它们。

library(lubridate)
library(dplyr)
library(tidyr)

现在创建两个数据框。第一组按月和批处理,第二组按月:

df_batch <- df %>%
mutate(date = as.POSIXct(date), resp=strsplit(resp, ",")) %>%
unnest(resp) %>%
group_by(month=month(date), batch) %>%
count(resp) %>%
mutate(Total=max(n), p=100*n/Total) %>%
pivot_wider(names_from=resp, values_from=c(n,p), values_fill=list(n=0, p=0)) %>%
ungroup() %>%
mutate(p_Total=100*Total/sum(Total)) %>%
select(month,batch,starts_with("n"),Total,starts_with("p"))

df_totals <- df %>%
mutate(date = as.POSIXct(date), resp=strsplit(resp, ",")) %>%
group_by(month=month(date)) %>%
mutate(Total=n()) %>%
unnest(resp) %>%
count(Total, resp) %>%
mutate(p=100*n/Total) %>%
pivot_wider(names_from=resp, values_from=c(n, p)) %>%
mutate(batch="Total", p_Total=100)

bind_rows(df_batch, df_totals)

# A tibble: 7 x 10
month batch n_ES n_RT n_AL Total p_ES p_RT p_AL p_Total
<dbl> <chr> <int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
1 1 E 1 1 0 1 100 100 0 14.3
2 1 Q 1 1 1 1 100 100 100 14.3
3 1 R 1 1 0 1 100 100 0 14.3
4 1 T 1 1 2 2 50 50 100 28.6
5 1 W 1 0 1 1 100 0 100 14.3
6 1 Y 1 1 1 1 100 100 100 14.3
7 1 Total 6 5 5 7 85.7 71.4 71.4 100

它与您提供的格式不完全相同,但结果完全相同,应该可以使用数月。


数据:

structure(list(Id = c("uv-1", "uv-2", "uv-3", "uv-4", "uv-5", 
"uv-6", "uv-7"), date = c("2020-01-10 15:13:16", "2020-01-11 17:13:16",
"2020-01-12 18:13:16", "2020-01-13 12:13:16", "2020-01-14 13:13:16",
"2020-01-15 13:13:16", "2020-01-17 11:13:16"), price = c(1000L,
5000L, 2000L, 3000L, 1600L, 1600L, 1300L), batch = c("Q", "W",
"E", "R", "T", "T", "Y"), resp = c("ES,RT,AL", "ES,AL", "ES,RT",
"ES,RT", "RT,AL", "ES,AL", "ES,RT,AL")), class = "data.frame", row.names = c(NA,
-7L))

关于r - 从 R 数据帧中提取逗号分隔值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61053561/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com