gpt4 book ai didi

r - R 中的数据转换 - 傻瓜

转载 作者:行者123 更新时间:2023-12-04 14:56:07 26 4
gpt4 key购买 nike

我想与 4 个国家足球队(英格兰、比利时、德国和法国)和 n 个日期一起工作

Date        Matches
16.03 England X Brazil
16.03 Belgium X Argentina
16.03 Chile X Japan
16.03 Uruguay X Germany
16.03 Italy x France
17.03 South Korea X India
17.03 Germany X France
17.03 Poland X Belgium
17.03 Colombia X Russia
18.03 South Africa X Mexico
18.03 China X Japon
18.03 Brazil X Venezuela
... ...

带有假人的所需数据框。当 dummy = 1 时,球队开始比赛。当 dummy = 0 时,球队没有参加比赛(当天)。重要提示:每行只有一个日期。

Date   Dummy_england   Dummy_belgium    Dummy_germany    Dummy_France
16.03 1 1 1 1
17.03 0 1 1 1
18.03 0 0 0 0

非常感谢!!

最佳答案

我们可以使用tidyverse方法。

  1. 使用 str_extract 从“匹配”的每一行中提取选定的团队
  2. 仅保留匹配的行,即使用 filter
  3. 删除 NA 行
  4. 使用pivot_wider选择感兴趣的列后从长调整为“宽” - 将values_fn指定为lengthvalues_fill as 0 将默认的 NA 更改为 0
library(dplyr)
library(tidyr)
library(stringr)
df1 %>%
mutate(team = str_c('Dummy_', str_extract(Matches,
regex('England|Belgium|Germany|France', ignore_case = TRUE)))) %>%
filter(complete.cases(team)) %>%
select(-Matches) %>%
pivot_wider(names_from = team, values_from = team,
values_fn = length, values_fill = 0)

-输出

# A tibble: 2 x 5
Date Dummy_England Dummy_Belgium Dummy_Germany Dummy_France
<dbl> <int> <int> <int> <int>
1 16.0 1 1 1 1
2 17.0 0 1 1 0

如果我们想保留没有匹配项的“日期”,请使用complete

df1 %>%
mutate(team = str_c('Dummy_', str_extract(Matches,
regex('England|Belgium|Germany|France', ignore_case = TRUE)))) %>%
filter(complete.cases(team)) %>%
select(-Matches) %>%
pivot_wider(names_from = team, values_from = team,
values_fn = length, values_fill = 0) %>%
complete(Date = unique(df1$Date), fill = list(Dummy_England = 0,
Dummy_Belgium = 0, Dummy_Germany = 0, Dummy_France = 0))

-输出

# A tibble: 3 x 5
Date Dummy_England Dummy_Belgium Dummy_Germany Dummy_France
<dbl> <dbl> <dbl> <dbl> <dbl>
1 16.0 1 1 1 1
2 17.0 0 1 1 0
3 18.0 0 0 0 0

数据

df1 <- structure(list(Date = c(16.03, 16.03, 16.03, 16.03, 16.03, 17.03, 
17.03, 17.03, 17.03, 18.03, 18.03, 18.03), Matches = c("England X Brazil",
"Belgium X Argentina", "Chile X Japan", "Uruguay X Germany",
"Italy x France", "South Korea X India", "Germany X France",
"Poland X Belgium", "Colombia X Russia", "South Africa X Mexico",
"China X Japon", "Brazil X Venezuela")), class = "data.frame", row.names = c(NA,
-12L))

关于r - R 中的数据转换 - 傻瓜,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67952582/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com