gpt4 book ai didi

r - 根据一列中的条件在数据框中创建新变量,从另一列中提取? (dplyr)

转载 作者:行者123 更新时间:2023-12-05 00:48:03 25 4
gpt4 key购买 nike

我有以下数据框:

    df <- structure(list(country = c("Ghana", "Eritrea", "Ethiopia", "Ethiopia", 
"Congo - Kinshasa", "Ethiopia", "Ethiopia", "Ghana", "Botswana",
"Nigeria"), CommodRank = c(1L, 2L, 3L, 1L, 3L, 1L, 1L, 1L, 1L,
1L), topCommodInCountry = c(TRUE, FALSE, FALSE, TRUE, FALSE,
TRUE, TRUE, TRUE, TRUE, TRUE), Main_Commod = c("Gold", "Copper",
"Nickel", "Gold", "Gold", "Gold", "Gold", "Gold", "Diamonds",
"Iron Ore")), row.names = c(NA, -10L), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), vars = "country", drop = TRUE, indices = list(
8L, 4L, 1L, c(2L, 3L, 5L, 6L), c(0L, 7L), 9L), group_sizes = c(1L,
1L, 1L, 4L, 2L, 1L), biggest_group_size = 4L, labels = structure(list(
country = c("Botswana", "Congo - Kinshasa", "Eritrea", "Ethiopia",
"Ghana", "Nigeria")), row.names = c(NA, -6L), class = "data.frame", vars = "country", drop = TRUE, .Names = "country"), .Names = c("country",
"CommodRank", "topCommodInCountry", "Main_Commod"))

df

country CommodRank topCommodInCountry Main_Commod
1 Ghana 1 TRUE Gold
2 Eritrea 2 FALSE Copper
3 Ethiopia 3 FALSE Nickel
4 Ethiopia 1 TRUE Gold
5 Congo - Kinshasa 3 FALSE Gold
6 Ethiopia 1 TRUE Gold
7 Ethiopia 1 TRUE Gold
8 Ghana 1 TRUE Gold
9 Botswana 1 TRUE Diamonds
10 Nigeria 1 TRUE Iron Ore

我正在尝试添加另一列,显示此数据集中每个国家/地区的顶级商品(顶级 CommodRank),但我不确定如何。我可以用“Main_Commod”来标记“topcommod”,其中 CommodRank == 1,但我想将这个相同的值复制到 CommodRank != 1 的情况。往下看,第 3 行和第 4 行的埃塞俄比亚值都应该是“金子'。
df %>% mutate(topcommod = ifelse(CommodRank == 1, Main_Commod, 'unknown'))


country CommodRank topCommodInCountry Main_Commod topcommod
1 Ghana 1 TRUE Gold Gold
2 Eritrea 2 FALSE Copper unknown
3 Ethiopia 3 FALSE Nickel unknown
4 Ethiopia 1 TRUE Gold Gold
5 Congo - Kinshasa 3 FALSE Gold unknown
6 Ethiopia 1 TRUE Gold Gold
7 Ethiopia 1 TRUE Gold Gold
8 Ghana 1 TRUE Gold Gold
9 Botswana 1 TRUE Diamonds Diamonds
10 Nigeria 1 TRUE Iron Ore Iron Ore

理想情况下,我正在寻找一个 dplyr 解决方案,我可以将其添加到现有的长系列管道 %>% 函数调用中,但任何解决方案都会有所帮助。

最佳答案

IIUC,有多种方法可以做到这一点,例如:

df %>% mutate(topCom = if(!any(topCommodInCountry)) "unknown" 
else Main_Commod[which.max(topCommodInCountry)])

# A tibble: 10 x 5
# Groups: country [6]
country CommodRank topCommodInCountry Main_Commod topCom
<chr> <int> <lgl> <chr> <chr>
1 Ghana 1 TRUE Gold Gold
2 Eritrea 2 FALSE Copper unknown
3 Ethiopia 3 FALSE Nickel Gold
4 Ethiopia 1 TRUE Gold Gold
5 Congo - Kinshasa 3 FALSE Gold unknown
6 Ethiopia 1 TRUE Gold Gold
7 Ethiopia 1 TRUE Gold Gold
8 Ghana 1 TRUE Gold Gold
9 Botswana 1 TRUE Diamonds Diamonds
10 Nigeria 1 TRUE Iron Ore Iron Ore

关于 OP 在评论中如何处理多个顶级商品的关系的问题,您可以执行以下操作:
df %>% 
mutate(topCom = if(!any(topCommodInCountry)) "unknown"
else paste(unique(Main_Commod[topCommodInCountry]), collapse = "/"))

如果一个国家有多个唯一的顶级商品,它们将被粘贴到一个字符串中,以 /分隔。 .

关于r - 根据一列中的条件在数据框中创建新变量,从另一列中提取? (dplyr),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50374423/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com