gpt4 book ai didi

r - 在几个数据框中比较一组项目 - 按行

转载 作者:行者123 更新时间:2023-12-02 03:28:08 28 4
gpt4 key购买 nike

我有几个要比较的数据框。让我们从展示两个数据示例开始:

数据1:

> dput(data1)
structure(list(cluster = c(1, 1, 2, 3, 3, 4, 5, 6, 6, 6, 6, 6,
6, 6, 7, 8, 9, 10, 11, 11, 11, 11, 12, 12, 12, 13, 13, 13, 13,
14, 15, 15), description = c("BTB", "BTB", "CVA", "BAS", "TRK",
"EXT", "LRA", "CAW", "CAW", "CAW", "CAW", "CAW", "TTE", "TTE",
"MYU", "MTQ", "PLI", "KQA", "STG", "STG", "ATF", "ATF", "REW",
"REW", "REW", "KIR", "KIR", "ROR", "ROR", "FRQ", "QEQ", "QEQ"
)), .Names = c("cluster", "description"), row.names = c("Mazda RX4",
"Mazda RX4 Wag", "Datsun 710", "Hornet 4 Drive", "Hornet Sportabout",
"Valiant", "Duster 360", "Merc 240D", "Merc 230", "Merc 280",
"Merc 280C", "Merc 450SE", "Merc 450SL", "Merc 450SLC", "Cadillac Fleetwood",
"Lincoln Continental", "Chrysler Imperial", "Fiat 128", "Honda Civic",
"Toyota Corolla", "Toyota Corona", "Dodge Challenger", "AMC Javelin",
"Lotus Europa", "Ford Pantera L", "Ferrari Dino", "Maserati Bora",
"Volvo 142E", "Volvo 144", "Chrysler", "Ford 131", "Ford 144"
), class = "data.frame")

数据2:
    > dput(data2)
structure(list(cluster = c(3, 4, 5, 5, 5, 6, 6, 3, 3, 6, 7, 8,
9, 10, 11, 11, 11, 11, 12, 12, 12, 13, 14, 13, 11, 14, 15, 15,
1, 1, 2, 2), description = c("TRK", "EXT", "LRA", "CAW", "CAW",
"CAW", "CAW", "CAW", "TTE", "TTE", "MYU", "MTQ", "PLI", "KQA",
"STG", "STG", "ATF", "ATF", "REW", "REW", "REW", "KIR", "KIR",
"ROR", "ROR", "FRQ", "QEQ", "QEQ", "BTB", "BTB", "CVA", "BAS"
)), .Names = c("cluster", "description"), row.names = c("Hornet Sportabout",
"Valiant", "Duster 360", "Merc 240D", "Merc 230", "Merc 280",
"Merc 280C", "Merc 450SE", "Merc 450SL", "Merc 450SLC", "Cadillac Fleetwood",
"Lincoln Continental", "Chrysler Imperial", "Fiat 128", "Honda Civic",
"Toyota Corolla", "Toyota Corona", "Dodge Challenger", "AMC Javelin",
"Lotus Europa", "Ford Pantera L", "Ferrari Dino", "Maserati Bora",
"Volvo 142E", "Volvo 144", "Chrysler", "Ford 131", "Ford 144",
"Mazda RX4", "Mazda RX4 Wag", "Datsun 710", "Hornet 4 Drive"), class = "data.frame")

所以在两个数据集中,我们可以找到相同的 row.names 和 description,但顺序不同。
我想在同一集群中找到的汽车之间进行比较。所以,让我们以 "Merc 240D" 为例:

它属于 cluster ==6连同( data ):
            cluster description
Merc 240D 6 CAW
Merc 230 6 CAW
Merc 280 6 CAW
Merc 280C 6 CAW
Merc 450SE 6 CAW
Merc 450SL 6 TTE
Merc 450SLC 6 TTE

现在让我们转到第二个 data2 .这次 "Merc 240D"与以下一起属于第 5 组:
Duster 360                5         LRA
Merc 240D 5 CAW
Merc 230 5 CAW

这次同一个集群中只有三辆车,但只有一辆可以和 "Merc 240D"一起找到。在两个数据集中是 "Merc 230" .

我想对我的数据集中的每一行(汽车)进行这样的分析。分析它属于哪个集群,与谁一起并与其他数据集进行比较。

问题是我有大约 20 个数据集可以这样比较。我相信循环是必要的!

作为输出,我希望有这样的表格(只是示例):
               nr_partners  name of partners       Description Descr_partners 
Merc 240D 3 Merc1, Merc2, Merc3 CAW CAW, TTE, TTE

这样的事情可以做吗?预先感谢您的帮助!

最佳答案

如果您只想为每个表返回示例输出表。您可以使用 aggregatemerge .以后型号名称怎么做,可以根据其他信息进行调整:

    # first make a column aggregating all the partners for each cluster
pasteAlphabetical <- function(vectNames){
return(paste(sort(vectNames),collapse=","))
}
byCluster <-aggregate(row.names(data1),by=list(cluster=data1$cluster),pasteAlphabetical)

# then you can attribute this to each row
data1 <- merge(data1,byCluster,by="cluster")

但是,如果您想查看哪些模型在多个表的相同集群中,则需要使用 merge在所有表的集群上,然后聚合始终在同一集群中的模型:
    # get the clusters in each table for each car
SummarizeClusters <- function(datas){
for(id in 1:length(datas)) datas[[id]]$names <- row.names(datas[[id]])
summaryDat <- datas[[1]][,c("cluster","description"),drop=FALSE]
summaryDat$names <- row.names(datas[[1]])

for(iData in 2:length(datas)){
summaryDat <- merge(summaryDat,datas[[iData]],by="names",all=TRUE)
}

return(summaryDat)
}
datas <- list(data1,data2)
sumDat <- SummarizeClusters(datas)

clusterCols <- names(sumDat)[grep("cluster",names(sumDat))] # get cluster column names

# and then aggregate models that have clusters in common
alwaysSameClusters<-aggregate(sumDat$names,
by=sumDat[,clusterCols],pasteAlphabetical)

这为您提供了始终在同一集群中关联的模型列表。

我不确定您到底想要做什么,但这应该为您提供要遵循的原则,包括针对大量数据集的原则。

关于r - 在几个数据框中比较一组项目 - 按行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29168113/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com