gpt4 book ai didi

r - 查找两个表之间的差异

转载 作者:行者123 更新时间:2023-12-03 22:16:34 25 4
gpt4 key购买 nike

我在 SAS/SQL 背景下使用 R,并尝试编写代码来获取两个表,比较它们,并提供差异列表。此代码将重复用于许多不同的表集,因此我需要避免硬编码。

我正在与 Identifying specific differences between two data sets in R 一起工作,但它并没有让我一路走到那里。

示例数据,使用姓氏/名字的组合(这是唯一的)作为键——

Dataset One --

Last_Name First_Name Street_Address ZIP VisitCount
Doe John 1234 Main St 12345 20
Doe Jane 4321 Tower St 54321 10
Don Bob 771 North Ave 23232 5
Smith Mike 732 South Blvd. 77777 3

Dataset Two --

Last_Name First_Name Street_Address ZIP VisitCount
Doe John 1234 Main St 12345 20
Doe Jane 4111 Tower St 32132 17
Donn Bob 771 North Ave 11111 5

Desired Output --

LastName FirstName VarName TableOne TableTwo
Doe Jane StreetAddress 4321 Tower St 4111 Tower St
Doe Jane Zip 23232 32132
Doe Jane VisitCount 5 17

请注意,此输出会忽略我在两个表中没有相同 ID 的记录(例如,因为 Bob 的姓氏在一个表中是“Don”,而在另一个表中是“Donn”,我们完全忽略了该记录)。

我已经通过在两个数据集上应用melt 函数,然后比较它们来探索这样做,但我正在使用的大小数据表明这不切实际。在 SAS 中,我使用 Proc Compare 进行此类工作,但我还没有在 R 中找到完全等效的。

最佳答案

这是基于 data.table 的解决方案:

library(data.table)

# Convert into data.table, melt
setDT(d1)
d1 <- d1[, list(VarName = names(.SD), TableOne = unlist(.SD, use.names = F)),by=c('Last_Name','First_Name')]

setDT(d2)
d2 <- d2[, list(VarName = names(.SD), TableTwo = unlist(.SD, use.names = F)),by=c('Last_Name','First_Name')]

# Set keys for merging
setkey(d1,Last_Name,First_Name,VarName)

# Merge, remove duplicates
d1[d2,nomatch=0][TableOne!=TableTwo]

# Last_Name First_Name VarName TableOne TableTwo
# 1: Doe Jane Street_Address 4321 Tower St 4111 Tower St
# 2: Doe Jane ZIP 54321 32132
# 3: Doe Jane VisitCount 10 17

其中输入数据集是:
# Input Data Sets
d1 <- structure(list(Last_Name = c("Doe", "Doe", "Don", "Smith"), First_Name = c("John",
"Jane", "Bob", "Mike"), Street_Address = c("1234 Main St", "4321 Tower St",
"771 North Ave", "732 South Blvd."), ZIP = c(12345L, 54321L,
23232L, 77777L), VisitCount = c(20L, 10L, 5L, 3L)), .Names = c("Last_Name",
"First_Name", "Street_Address", "ZIP", "VisitCount"), class = "data.frame", row.names = c(NA, -4L))

d2 <- structure(list(Last_Name = c("Doe", "Doe", "Donn"), First_Name = c("John",
"Jane", "Bob"), Street_Address = c("1234 Main St", "4111 Tower St",
"771 North Ave"), ZIP = c(12345L, 32132L, 11111L), VisitCount = c(20L,
17L, 5L)), .Names = c("Last_Name", "First_Name", "Street_Address",
"ZIP", "VisitCount"), class = "data.frame", row.names = c(NA, -3L))

关于r - 查找两个表之间的差异,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28056805/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com