r - dplyr : how-to programmatically full_join dataframes contained in a list of lists?-6ren

r - dplyr : how-to programmatically full_join dataframes contained in a list of lists?

转载作者：行者123 更新时间：2023-12-03 03:36:41

上下文和数据结构

我将与您分享我庞大的数据集的简化版本。这个简化的版本完全尊重我原始数据集的结构，但是包含的列表元素，数据框，变量和观察值少于原始数据集。

根据对问题的最强烈的回答:How to make a great R reproducible example ?，我使用dput(query1)的输出共享我的数据集，通过复制/粘贴R控制台中的以下代码块，您可以立即在R中使用某些内容:

       structure(list(plu = structure(list(year = structure(list(id = 1:3,
    station = 100:102, pluMean = c(0.509068994778059, 1.92866478959912,
    1.09517453602154), pluMax = c(0.0146962179957886, 0.802984389130343,
    2.48170762478472)), .Names = c("id", "station", "pluMean",
"pluMax"), row.names = c(NA, -3L), class = "data.frame"), month = structure(list(
    id = 1:3, station = 100:102, pluMean = c(0.66493845927034,
    -1.3559338786041, 0.195600637750077), pluMax = c(0.503424623872161,
    0.234402501255681, -0.440264545434053)), .Names = c("id",
"station", "pluMean", "pluMax"), row.names = c(NA, -3L), class = "data.frame"),
    week = structure(list(id = 1:3, station = 100:102, pluMean = c(-0.608295829330578,
    -1.10256919591373, 1.74984007126193), pluMax = c(0.969668266601551,
    0.924426323739882, 3.47460867665884)), .Names = c("id", "station",
    "pluMean", "pluMax"), row.names = c(NA, -3L), class = "data.frame")), .Names = c("year",
"month", "week")), tsa = structure(list(year = structure(list(
    id = 1:3, station = 100:102, tsaMean = c(-1.49060721773042,
    -0.684735418997484, 0.0586655881113975), tsaMax = c(0.25739838787582,
    0.957634817758648, 1.37198023881125)), .Names = c("id", "station",
"tsaMean", "tsaMax"), row.names = c(NA, -3L), class = "data.frame"),
    month = structure(list(id = 1:3, station = 100:102, tsaMean = c(-0.684668662999479,
    -1.28087846387974, -0.600175481941456), tsaMax = c(0.962916941685075,
    0.530773351897188, -0.217143593955998)), .Names = c("id",
    "station", "tsaMean", "tsaMax"), row.names = c(NA, -3L), class = "data.frame"),
    week = structure(list(id = 1:3, station = 100:102, tsaMean = c(0.376481732842365,
    0.370435880636005, -0.105354927593471), tsaMax = c(1.93833635147645,
    0.81176751708868, 0.744932493064975)), .Names = c("id", "station",
    "tsaMean", "tsaMax"), row.names = c(NA, -3L), class = "data.frame")), .Names = c("year",
"month", "week"))), .Names = c("plu", "tsa"))

执行此操作后，如果执行 str(query1),，您将获得示例数据集的结构为:

    > str(query1)
List of 2
 $ plu:List of 3
  ..$ year :'data.frame':   3 obs. of  4 variables:
  .. ..$ id     : int [1:3] 1 2 3
  .. ..$ station: int [1:3] 100 101 102
  .. ..$ pluMean: num [1:3] 0.509 1.929 1.095
  .. ..$ pluMax : num [1:3] 0.0147 0.803 2.4817
  ..$ month:'data.frame':   3 obs. of  4 variables:
  .. ..$ id     : int [1:3] 1 2 3
  .. ..$ station: int [1:3] 100 101 102
  .. ..$ pluMean: num [1:3] 0.665 -1.356 0.196
  .. ..$ pluMax : num [1:3] 0.503 0.234 -0.44
  ..$ week :'data.frame':   3 obs. of  4 variables:
  .. ..$ id     : int [1:3] 1 2 3
  .. ..$ station: int [1:3] 100 101 102
  .. ..$ pluMean: num [1:3] -0.608 -1.103 1.75
  .. ..$ pluMax : num [1:3] 0.97 0.924 3.475
 $ tsa:List of 3
  ..$ year :'data.frame':   3 obs. of  4 variables:
  .. ..$ id     : int [1:3] 1 2 3
  .. ..$ station: int [1:3] 100 101 102
  .. ..$ tsaMean: num [1:3] -1.4906 -0.6847 0.0587
  .. ..$ tsaMax : num [1:3] 0.257 0.958 1.372
  ..$ month:'data.frame':   3 obs. of  4 variables:
  .. ..$ id     : int [1:3] 1 2 3
  .. ..$ station: int [1:3] 100 101 102
  .. ..$ tsaMean: num [1:3] -0.685 -1.281 -0.6
  .. ..$ tsaMax : num [1:3] 0.963 0.531 -0.217
  ..$ week :'data.frame':   3 obs. of  4 variables:
  .. ..$ id     : int [1:3] 1 2 3
  .. ..$ station: int [1:3] 100 101 102
  .. ..$ tsaMean: num [1:3] 0.376 0.37 -0.105
  .. ..$ tsaMax : num [1:3] 1.938 0.812 0.745

那么，它的读法如何呢？我有一个由2个参数元素( query1和 plu)组成的大列表( tsa)，这2个参数元素中的每一个都是由3个元素( year， month和 week)组成的列表，这3个元素中的每一个都是由timeInterval数据帧组成相同的4个变量列( id， station， mean， max)和观察数( 3)完全相同。

我想要达成的目标

我想通过 full_join和 id在所有timeInterval数据帧中以相同的名称( station， year和 month)以编程方式 week。这意味着我应该以一个新列表( query1Changed)结束，该列表包含3个数据帧( year， month， week)，每个数据帧包含5列( id， station， pluMean， pluMax， tsaMean， tsaMax)和3个观察值。在示意图上，我需要按以下方式排列数据:

按站号和id进行full_join:

带有df query1$plu$year的df query1$tsa$year

带有df query1$plu$month的df query1$tsa$month

带有df query1$plu$week的df query1$tsa$week

或用另一种表示形式表达:

带有df query1[[1]][[1]]的df query1[[2]][[1]]

带有df query1[[1]][[2]]的df query1[[2]][[2]]

带有df query1[[1]][[3]]的df query1[[2]][[3]]

并以编程方式表示(n是大列表中元素的总数):

带有df query1[[i]][[1]]的df query1[[i+1]][[1]] ...带有df query1[[n]][[1]]的

带有df query1[[i]][[2]]的df query1[[i+1]][[2]] ...带有df query1[[n]][[2]]的

带有df query1[[i]][[3]]的df query1[[i+1]][[3]] ...带有df query1[[n]][[3]]的

我需要以编程方式实现这一目标，因为在我的真实项目中，我可能会遇到另一个大列表，每个timeIntervals数据帧中的参数元素超过2个，变量列超过4个。

在我的分析中，将始终保持不变的事实是，另一个大列表的所有参数元素将始终具有相同数量的具有相同名称的timeIntervals数据帧，并且每个timeIntervals数据帧将始终具有相同数量的观察值和始终共享名称和值完全相同的2列( id和 station)

我成功了

执行以下代码:

> query1Changed <- do.call(function(...) mapply(bind_cols, ..., SIMPLIFY=F), args = query1)

按预期排列数据。但这不是一个整洁的解决方案，因为我们最终得到重复的列名( id和 station):

> str(query1Changed)
List of 3
 $ year :'data.frame':  3 obs. of  8 variables:
  ..$ id      : int [1:3] 1 2 3
  ..$ station : int [1:3] 100 101 102
  ..$ pluMean : num [1:3] 0.509 1.929 1.095
  ..$ pluMax  : num [1:3] 0.0147 0.803 2.4817
  ..$ id1     : int [1:3] 1 2 3
  ..$ station1: int [1:3] 100 101 102
  ..$ tsaMean : num [1:3] -1.4906 -0.6847 0.0587
  ..$ tsaMax  : num [1:3] 0.257 0.958 1.372
 $ month:'data.frame':  3 obs. of  8 variables:
  ..$ id      : int [1:3] 1 2 3
  ..$ station : int [1:3] 100 101 102
  ..$ pluMean : num [1:3] 0.665 -1.356 0.196
  ..$ pluMax  : num [1:3] 0.503 0.234 -0.44
  ..$ id1     : int [1:3] 1 2 3
  ..$ station1: int [1:3] 100 101 102
  ..$ tsaMean : num [1:3] -0.685 -1.281 -0.6
  ..$ tsaMax  : num [1:3] 0.963 0.531 -0.217
 $ week :'data.frame':  3 obs. of  8 variables:
  ..$ id      : int [1:3] 1 2 3
  ..$ station : int [1:3] 100 101 102
  ..$ pluMean : num [1:3] -0.608 -1.103 1.75
  ..$ pluMax  : num [1:3] 0.97 0.924 3.475
  ..$ id1     : int [1:3] 1 2 3
  ..$ station1: int [1:3] 100 101 102
  ..$ tsaMean : num [1:3] 0.376 0.37 -0.105
  ..$ tsaMax  : num [1:3] 1.938 0.812 0.745

我们可以添加第二个过程来“清理”数据，但这不是最有效的解决方案。因此，我不想使用此替代方法。

接下来，我尝试使用dplyr full_join进行相同操作，但没有成功。执行以下代码:

> query1Changed <- do.call(function(...) mapply(full_join(..., by = c("station", "id")), ..., SIMPLIFY=F), args = query1)

返回以下错误:

Error in UseMethod("full_join") :
  no applicable method for 'full_join' applied to an object of class "list"

因此，如何编写我的full_join表达式以使其在数据帧上运行？

还是有另一种方法可以有效地执行我的数据转换？

我在网络上发现了什么可以帮助您？

我已经找到了相关的问题，但是我仍然想不出如何使他们的解决方案适应我的问题。

在stackoverflow上:
- Merging a data frame from a list of data frames [duplicate]
- Simultaneously merge multiple data.frames in a list
- Joining list of data.frames from map() call
- Combining elements of list of lists by index

在博客上:
- Joining a List of Data Frames with purrr::reduce()

任何帮助将不胜感激。希望我已经明确说明了我的问题。
我仅在2个月前才开始使用R进行编程，所以如果解决方案显而易见，请放纵自己；)

最佳答案

首先，感谢您对问题所在以及解决方案所需的要求进行了非常详尽的描述。

首先，我将使用purrr::map2创建一个函数，该函数接受两个数据帧列表并将其并行连接。也就是说，它将plu的第一个数据帧与tsa的第一个...连接到plu的最后一个与tsa的最后一个，并将结果作为列表返回。

> join_each = function(x, y) map2(x, y, full_join)
> join_each(query1$plu, query1$tsa)
Joining, by = c("id", "station")
Joining, by = c("id", "station")
Joining, by = c("id", "station")
$year
  id station  pluMean     pluMax     tsaMean    tsaMax
1  1     100 0.509069 0.01469622 -1.49060722 0.2573984
2  2     101 1.928665 0.80298439 -0.68473542 0.9576348
3  3     102 1.095175 2.48170762  0.05866559 1.3719802

$month
  id station    pluMean     pluMax    tsaMean     tsaMax
1  1     100  0.6649385  0.5034246 -0.6846687  0.9629169
2  2     101 -1.3559339  0.2344025 -1.2808785  0.5307734
3  3     102  0.1956006 -0.4402645 -0.6001755 -0.2171436

$week
  id station    pluMean    pluMax    tsaMean    tsaMax
1  1     100 -0.6082958 0.9696683  0.3764817 1.9383364
2  2     101 -1.1025692 0.9244263  0.3704359 0.8117675
3  3     102  1.7498401 3.4746087 -0.1053549 0.7449325

好吧，当它们只有两个时，这可以工作，但是当有n个data.frames列表时，您希望它可以工作。现在您将需要 purrr::reduce:

> reduce(query1, join_each)
Joining, by = c("id", "station")
Joining, by = c("id", "station")
Joining, by = c("id", "station")
$year
  id station  pluMean     pluMax     tsaMean    tsaMax
1  1     100 0.509069 0.01469622 -1.49060722 0.2573984
2  2     101 1.928665 0.80298439 -0.68473542 0.9576348
3  3     102 1.095175 2.48170762  0.05866559 1.3719802

$month
  id station    pluMean     pluMax    tsaMean     tsaMax
1  1     100  0.6649385  0.5034246 -0.6846687  0.9629169
2  2     101 -1.3559339  0.2344025 -1.2808785  0.5307734
3  3     102  0.1956006 -0.4402645 -0.6001755 -0.2171436

$week
  id station    pluMean    pluMax    tsaMean    tsaMax
1  1     100 -0.6082958 0.9696683  0.3764817 1.9383364
2  2     101 -1.1025692 0.9244263  0.3704359 0.8117675
3  3     102  1.7498401 3.4746087 -0.1053549 0.7449325

它计算 join_each(query1[[1]], query1[[2]]) %>% join_each(query1[[3]]) ... %>% join_each(query1[[n]])。

更新:以下单行代码执行的操作相同: reduce(query1, map2, full_join)。但是，它不那么可读。

关于r - dplyr : how-to programmatically full_join dataframes contained in a list of lists?，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45963678/

文章推荐： r - "More Columns than Column Names"错误是什么意思？

文章推荐： Flutter:从 Firebase 实时数据库项目创建 GridView

文章推荐： java - Java 中的线程状态机

r - 如何创建像这样的多维度列表 DATA<-list(list(list(),list(),list()),list(list(),list(),list()),list(list() ，列表()，列表()))？
我想使用 R 预定义这样的列表 DATA<-list( list(list(),list(),list()), list(list(),list(),list()), list(list(),l
haskell - 如何 `List + List = List[List]]`
如何将一个列表添加到另一个列表，返回一个列表的列表？ foo :: [a] -> [a] -> [[a]] 例如，我想要的结果是: foo [1,2] [3,4] 将是 [[1,2], [3,4]]。
python - 从 "lists of lists"和 "list"创建两个单独的 "list of lists"
我还没有在这里找到类似问题的解决方案，所以我会寻求你的帮助。有 2 个列表，其中之一是列表列表: categories = ['APPLE', 'ORANGE', 'BANANA'] test_re
python - "Flatten"list 包含lists of lists to lists of lists
这个问题不同于Converting list of lists / nested lists to list of lists without nesting (这会产生一组非常具体的响应，但无法解决
java - 无法从 List 转换为 List>
原始列表转换为 List正好。为什么原始列表的列表不能转换为 List 的列表？ { // works List raw = null; List wild = raw; } {
java - 涉及类型参数时，List> 不能赋值给 List>
在下面的代码中，get()被调用并将其结果分配给类型为 List> 的变量. get()返回 List>并在类型参数为 T 的实例上调用设置为 ? ，所以它应该适合。 import java.util
java - 无法从 List 转换为 List>
原始列表转换为 List正好。为什么原始列表的列表不能转换为 List 的列表? { // works List raw = null; List wild = raw; } {
scala - 在不够多态的情况下，为什么实现 `List a -> List a -> List a` 的方法比 `List Char -> List Char -> List Char` 少
在insufficiently-polymorphic 作者说: def foo[A](fst: List[A], snd: List[A]): List[A] There are fewer way
kotlin - List > + List = List <任何>？
我有下面的代码有效。 class ListManipulate(val list: List, val blockCount: Int) { val result: MutableList>
java - 有没有一种好的方法可以将 List>> 转换为 List>> 而不需要 3 个嵌套循环？
关闭。这个问题需要多问focused 。目前不接受答案。想要改进此问题吗？更新问题，使其仅关注一个问题 editing this post . 已关闭 5 年前。 Improve this ques
Scala - 将列表列表转换为单个列表 : List[List[A]] to List[A]
在 scala (2.9) 中转换列表列表的最佳方法是什么？我有一个 list : List[List[A]] 我想转换成 List[A] 如何递归地实现这一点？或者还有其他更好的办法吗？最佳答案
list - 标准ML : Searching through a list of lists
我编写了这个函数来确定给定元素是否存储在元组列表的列表中，但目前它只搜索第一个列表。我将如何搜索其余列表？ fun findItem (name : command, ((x,y)::firstlis
Java List of List of List，更好的解决方案？
我创建了一个类名 objectA，它有 4 个变量:约会时间;字符串文本；变量 1，变量 2 我需要创建一个 ObjectA() 列表。然后首先按时间对它们进行分组，其次按 var1，然后按 var2
python : Removing a List from List of List?
我有一套说法 char={'J','A'} 和列表的列表 content = [[1,'J', 2], [2, 'K', 3], [2, 'A', 3], [3,'A', 9], [5, 'J', 9
java - 访问List>>> titles = new ArrayList>>>();
我有以下列表 List >>> titles = new ArrayList >>> ();我想访问它的元素，但我不知道该怎么做.. 该列表有 1 个元素，它又包含 3 个元素，这 3 个元素中的
scala - 如何将 List[List[Long]] 转换为 List[List[Int]]？
转换 List[List[Long]] 的最佳方法是什么？到 List[List[Int]]在斯卡拉？例如，给定以下类型列表 List[List[Long]] val l: List[List[Lo
Java:将 List> 转换为 List>
我有一个来自 Filereader (String) 的 List-List，如何将其转换为 List-List (Double):我必须返回一个包含 line-Array 的第一个 Values 的
c# - 将 List> 转换为 List>
我收集了List> 。我需要将其转换为List> 。这是我尝试过的， List> dataOne = GetDataOne(); var dataTwo = dataOne.Select(x => x
java - List> 和 List 是 java 中不兼容的类型
这个问题在这里已经有了答案: Cannot convert from List to List> (3 个答案) 关闭 7 年前。我没有得到这段代码以任何方式编译: List a = new Ar
java - List> 和 List 是 java 中不兼容的类型
这个问题在这里已经有了答案: Cannot convert from List to List> (3 个答案) 关闭 7 年前。我没有得到这段代码以任何方式编译: List a = new Ar

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

r - dplyr : how-to programmatically full_join dataframes contained in a list of lists?