gpt4 book ai didi

r - 基于具有不同列名的 2 列合并 2 个数据框

转载 作者:行者123 更新时间:2023-12-04 17:11:16 25 4
gpt4 key购买 nike

我有 2 个非常大的数据集,如下所示:

merge_data <- data.frame(ID = c(1,2,3,4,5,6,7,8,9,10), 
position=c("yes","no","yes","no","yes",
"no","yes","no","yes","yes"),
school = c("a","b","a","a","c","b","c","d","d","e"),
year1 = c(2000,2000,2000,2001,2001,2000,
2003,2005,2008,2009),
year2=year1-1)


merge_data

ID position school year1 year2
1 1 support a 2000 1999
2 2 oppose b 2000 1999
3 3 support a 2000 1999
4 4 oppose a 2001 2000
5 5 support c 2001 2000
6 6 oppose b 2000 1999
7 7 support c 2003 2002
8 8 oppose d 2005 2004
9 9 support d 2008 2007
10 10 support e 2009 2008



merge_data_2 <- data.frame(year=c(1999,1999,2000,2000,2000,2001,2003
,2012,2009,2009,2008,2002,2009,2005,
2001,2000,2002,2000,2008,2005),
amount=c(100,200,300,400,500,600,700,800,900,
1000,1100,1200,1300,1400,1500,1600,
1700,1800,1900,2000),
ID=c(1,1,2,2,2,3,3,3,5,6,8,9,10,13,15,17,19,20,21,7))


merge_data_2
year amount ID
1 1999 100 1
2 1999 200 1
3 2000 300 2
4 2000 400 2
5 2000 500 2
6 2001 600 3
7 2003 700 3
8 2012 800 3
9 2009 900 5
10 2009 1000 6
11 2008 1100 8
12 2002 1200 9
13 2009 1300 10
14 2005 1400 13
15 2001 1500 15
16 2000 1600 17
17 2002 1700 19
18 2000 1800 20
19 2008 1900 21
20 2005 2000 7

而我想要的是:
 ID position school year1 year2 amount
1 yes a 2000 1999 300
2 no b 2000 1999 1200
10 yes e 2009 2008 1300

对于merge_data_2中的ID=1,我们有amount=300,因为有2种情况ID=1,并且他们的year1或year1等于merge_data中ID=1的年份

所以基本上我想要的是根据 ID 和年份执行合并。
2条件:
  • merge_data 中的 ID 与 merge_data_2 中的 ID 匹配
  • merge_data 中的 year1 和 year2 之一也与 merge_data_2 中的年份匹配。
    然后根据每个 ID 的金额总和进行合并。

  • 我认为代码将如下所示:
    merge_data_final <- merge(merge_data, merge_data_2, 
    merge_data$ID == merge_data_2$ID && (merge_data$year1 ||
    merge_data$year2 == merge_data_2$year))

    Then somehow to aggregate the amount by ID.

    显然我知道代码是错误的,并且我一直在考虑 plyr 或 reshape 库,但是很难掌握它们。

    任何帮助都会很棒!谢谢你们!

    最佳答案

    如上所述,我认为您的示例输入和输出数据之间存在一些差异。这是基本方法 - 您使用 reshape2 走在正确的轨道上.您可以简单地melt()您的数据转换为长格式,因此您将加入单个列,而不是您之前进行的任一/或位。

    library(reshape2)
    #melt into long format
    merge_data_m <- melt(merge_data, measure.vars = c("year1", "year2"))
    #merge together, specifying the joining columns
    merge(merge_data_m, merge_data_2, by.x = c("ID", "value"), by.y = c("ID", "year"))
    #-----
    ID value position school variable amount
    1 1 1999 yes a year2 100
    2 1 1999 yes a year2 200
    3 2 2000 no b year1 500
    4 2 2000 no b year1 300
    5 2 2000 no b year1 400

    关于r - 基于具有不同列名的 2 列合并 2 个数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12062035/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com