gpt4 book ai didi

r - 基于R中的多个最近匹配合并数据帧

转载 作者:行者123 更新时间:2023-12-05 01:24:19 25 4
gpt4 key购买 nike

我有一个大的(例如减少和简化的)数据框 (df1) 和一个较小的数据框 (df2)。我的目标是为每个变量 a:d 返回 df1 的单行,该行与 df2 中每个 a:d 的值最匹配,这样我就有了 julian(索引)的值和 df1 中与 a-d 最接近的匹配项。

对于下面的示例代码,julian 的结果值类似于 9,20,10,24。

我已经用 dplyr 尝试了各种方法,例如 inner_join,但还没有找到一种方法来完成近似/最近匹配。我确实在 data.table 中看到了一种解决类似问题的方法,roll = "nearest",但它似乎需要在单个列上设置一个键,我不太明白如何适应我的问题。

df1=structure(list(julian = 1:25, a = c(1.23651000172524, 3.53768533577433, 
4.67886484270661, 5.58620230847292, 6.50334496193416, 8.39024634778858,
8.07046072833857, 10.3224760103505, 12.0294821656441, 13.0051126737986,
13.6182967988946, 16.0093802756876, 16.3629833376647, 16.4032251800014,
20.7793611866438, 22.8896881514255, 24.9047753769525, 27.693270690425,
28.1354667109412, 28.7461982083997, 29.4226852088927, 30.0674285282579,
30.5385160713231, 30.2693063298828, 31.2137034020167), b = c(0.0818659903121299,
3.16872244667597, 5.18716715995415, 8.01357800633546, 9.03929283020061,
9.75879226945875, 10.6840909493883, 11.993517678357, 13.5471390532808,
14.5735221374674, 14.5629644871486, 14.3256165384972, 16.1055271965235,
16.0846958235405, 16.2822070785324, 17.5707074083182, 18.0211458028877,
18.2487504080849, 18.2748191242643, 20.2487076102867, 22.4499741093445,
22.7261732975307, 23.3057791855779, 24.205539951076, 26.8265110564906
), c = c(1.53338066501425, 1.79175662364937, 3.87114113188336,
7.25916181355392, 7.27167564643793, 9.45150932098826, 9.79130847937593,
8.74357825723914, 7.65948191832803, 8.29615547570692, 10.0896110687571,
10.622117084693, 12.8381085732359, 12.2479308117301, 14.1459672328209,
15.1971928224078, 16.2405218836198, 17.6742310372874, 18.2592891390671,
18.5239348003001, 17.8195001436016, 18.2938114977865, 19.3566862831869,
20.2531494246964, 21.2021100407786), d = c(1.06048990933208,
1.19478517029592, 2.01700007570706, 2.89122927429884, 2.84927053502588,
1.99042435941095, 2.84730950732035, 5.24044843997076, 6.78411332171822,
6.94262185334297, 8.46835997983591, 9.00847980399115, 10.0903648270813,
12.7882885080826, 13.5944586952418, 13.8267656264867, 14.7989016465317,
15.7181393355348, 17.3822790373397, 17.8017108178726, 19.8442415003192,
21.1743994279903, 20.9854326674201, 22.0177140097714, 24.0917743668801
)), class = "data.frame", row.names = c(NA, -25L))

df2=structure(list(a = 12.35, b = 19.63, c = 8.3, d = 22.5), class = "data.frame", row.names = c(NA,
-1L))

最佳答案

在不知道您希望如何格式化结果的情况下,您可以使用您提到的 data.table 滚动连接和 roll="nearest" 来做到这一点。

在这种情况下,我将两组数据融合到长数据集,以便可以在单个连接中完成匹配。

library(data.table)
setDT(df1)
setDT(df2)

df1[
match(
melt(df1, id.vars="julian")[
melt(df2, measure.vars=names(df2)),
on=c("variable","value"), roll="nearest"]$julian,
julian),
]
# julian a b c d
#1: 9 12.02948 13.54714 7.659482 6.784113
#2: 20 28.74620 20.24871 18.523935 17.801711
#3: 10 13.00511 14.57352 8.296155 6.942622
#4: 24 30.26931 24.20554 20.253149 22.017714

如果你想为每个连接单独的表而不是你可以做这样的事情:

lapply(names(df2), \(var)  df1[df2, on=var, roll="nearest", .SD, .SDcols=names(df1)] )

关于r - 基于R中的多个最近匹配合并数据帧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71460945/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com