gpt4 book ai didi

r - 当 x 发生时如何将 tibble 从日期表转换为具有 x 的分类数据的日期表

转载 作者:行者123 更新时间:2023-12-02 02:42:59 27 4
gpt4 key购买 nike

所以我有一个数据集,显示每个国家加入世界贸易组织 (WTO) 及其前身关税及贸易总协定 (1995) 的年份。需要注意的重要一点是,WTO 成立于 1995 年,是 GATT(成立于 1947 年)的扩展,一些 GATT 成员(例如下面的安哥拉)并没有在 1995 年立即加入 WTO,而是等到 1996 年或更晚,具体取决于国家。一些国家也不是 GATT 成员,但在 WTO 成立后加入了 WTO(例如下面的阿富汗)。

我想以下面第一个小标题的格式获取我的数据,并将格式更改为包含每个国家/地区所有年份的列表和一个分类变量,以显示它们是 GATT 成员、WTO 成员还是两者都不是然而。我的实际数据集比这个例子大得多,日期从 1948 年到 2017 年,还有更多国家/地区,所以手动执行此操作会很糟糕。

对于这个例子,只限制从 1992 年到 1996 年的日期并查看前 6 个国家,基本上我想从这个开始:

df <- data.frame(Country = c("Afghanistan", "Albania", "Angola", "Antigua and Barbuda", "Argentina", "Armenia"), 
Year_joined_WTO = c(2016, 2000, 1996, 1995, 1995, 2003),
Year_joined_GATT = c(NA, NA, 1994, 1987, 1967, NA))
df <- as_tibble(df)

> df
# A tibble: 6 x 3
Country Year_joined_WTO Year_joined_GATT
<fct> <dbl> <dbl>
1 Afghanistan 2016 NA
2 Albania 2000 NA
3 Angola 1996 1994
4 Antigua and Barbuda 1995 1987
5 Argentina 1995 1967
6 Armenia 2003 NA

为此:

df_intended <- data.frame(Country = c("Afghanistan", "Afghanistan","Afghanistan","Afghanistan","Afghanistan", "Albania", "Albania","Albania","Albania","Albania","Angola", "Angola","Angola","Angola","Angola","Antigua and Barbuda","Antigua and Barbuda","Antigua and Barbuda","Antigua and Barbuda","Antigua and Barbuda", "Argentina", "Argentina","Argentina","Argentina","Argentina","Armenia","Armenia","Armenia","Armenia","Armenia"), 
Year = c(1992, 1993, 1994, 1995, 1996, 1992, 1993, 1994, 1995, 1996,1992, 1993, 1994, 1995, 1996,1992, 1993, 1994, 1995, 1996,1992, 1993, 1994, 1995, 1996,1992, 1993, 1994, 1995, 1996),
Member_WTO_GATT = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "GATT", "GATT", "WTO", "GATT","GATT","GATT", "WTO", "WTO", "GATT","GATT","GATT", "WTO", "WTO", NA, NA, NA, NA, NA))
df_intended <- as_tibble(df_intended)

print(tbl_df(df_intended), n =30)

# A tibble: 30 x 3
Country Year Member_WTO_GATT
<fct> <dbl> <fct>
1 Afghanistan 1992 NA
2 Afghanistan 1993 NA
3 Afghanistan 1994 NA
4 Afghanistan 1995 NA
5 Afghanistan 1996 NA
6 Albania 1992 NA
7 Albania 1993 NA
8 Albania 1994 NA
9 Albania 1995 NA
10 Albania 1996 NA
11 Angola 1992 NA
12 Angola 1993 NA
13 Angola 1994 GATT
14 Angola 1995 GATT
15 Angola 1996 WTO
16 Antigua and Barbuda 1992 GATT
17 Antigua and Barbuda 1993 GATT
18 Antigua and Barbuda 1994 GATT
19 Antigua and Barbuda 1995 WTO
20 Antigua and Barbuda 1996 WTO
21 Argentina 1992 GATT
22 Argentina 1993 GATT
23 Argentina 1994 GATT
24 Argentina 1995 WTO
25 Argentina 1996 WTO
26 Armenia 1992 NA
27 Armenia 1993 NA
28 Armenia 1994 NA
29 Armenia 1995 NA
30 Armenia 1996 NA

我曾尝试将这些年份收集到一个列中,但我遇到的问题是如何在一个列中显示每个国家/地区的每一年,并显示他们加入后的年份。

我微弱的尝试:

df2 <- df %>% 
group_by(Country) %>%
gather(Year_joined_WTO, Year_joined_GATT, key = member_wto_gatt, value = Year)

> df2
# A tibble: 12 x 3
# Groups: Country [6]
Country member_wto_gatt Year
<fct> <chr> <dbl>
1 Afghanistan Year_joined_WTO 2016
2 Albania Year_joined_WTO 2000
3 Angola Year_joined_WTO 1996
4 Antigua and Barbuda Year_joined_WTO 1995
5 Argentina Year_joined_WTO 1995
6 Armenia Year_joined_WTO 2003
7 Afghanistan Year_joined_GATT NA
8 Albania Year_joined_GATT NA
9 Angola Year_joined_GATT 1994
10 Antigua and Barbuda Year_joined_GATT 1987
11 Argentina Year_joined_GATT 1967
12 Armenia Year_joined_GATT NA

我也尝试过与我想要的所有日期列表进行一些连接和合并(例如

years <- data.frame(Year = c(1992:1996))
years <- as_tibble(years)

> df3 <- right_join(df2, years)
Joining, by = "Year"
Warning message:
Factor `Country` contains implicit NA, consider using `forcats::fct_explicit_na`

> df3
# A tibble: 6 x 3
# Groups: Country [7]
Country member_wto_gatt Year
<fct> <chr> <dbl>
1 NA NA 1992
2 NA NA 1993
3 Angola Year_joined_GATT 1994
4 Antigua and Barbuda Year_joined_WTO 1995
5 Argentina Year_joined_WTO 1995
6 Angola Year_joined_WTO 1996

)但他们完全不成功,我找不到任何类似的例子来说明如何做到这一点。任何帮助将不胜感激

最佳答案

您可以尝试使用gathercompletefill收集 数据为长格式,使用sub 将列名包含"WTO""GATT"group_by Country 然后用最新的非 NA 值填充 NA 值。

library(dplyr)
library(tidyr)

df %>%
gather(key, Value, -Country) %>%
mutate(key = sub("Year_joined_", "", key)) %>%
group_by(Country) %>%
complete(Value = seq(1992, 1996)) %>%
fill(key)

对于您的真实数据,您可以使用 seq(min(Value), max(Value)) 而不是硬编码年份,或者如果您已经知道每个国家/地区应该有哪些年份,您可以使用这些数字.

关于r - 当 x 发生时如何将 tibble 从日期表转换为具有 x 的分类数据的日期表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58041033/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com