gpt4 book ai didi

python - 如何在 PySpark 中加入/合并具有公共(public)键的数据帧列表?

转载 作者:太空宇宙 更新时间:2023-11-03 13:32:33 25 4
gpt4 key购买 nike

df1
uid1 var1
0 John 3
1 Paul 4
2 George 5
df2
uid1 var2
0 John 23
1 Paul 44
2 George 52
df3
uid1 var3
0 John 31
1 Paul 45
2 George 53
df_lst=[df1,df2,df3]

如何根据公共(public)键 uid1 合并/加入列表中的 3 个数据框?

编辑:预期输出

   df1
uid1 var1 var2 var3
0 John 3 23 31
1 Paul 4 44 45
2 George 5 52 53

最佳答案

您可以加入数据框列表。下面是一个简单的例子

import spark.implicits._
val df1 = spark.sparkContext.parallelize(Seq(
(0,"John",3),
(1,"Paul",4),
(2,"George",5)
)).toDF("id", "uid1", "var1")

import spark.implicits._
val df2 = spark.sparkContext.parallelize(Seq(
(0,"John",23),
(1,"Paul",44),
(2,"George",52)
)).toDF("id", "uid1", "var2")

import spark.implicits._
val df3 = spark.sparkContext.parallelize(Seq(
(0,"John",31),
(1,"Paul",45),
(2,"George",53)
)).toDF("id", "uid1", "var3")


val df = List(df1, df2, df3)

df.reduce((a,b) => a.join(b, Seq("id", "uid1")))

输出:

+---+------+----+----+----+
| id| uid1|var1|var2|var3|
+---+------+----+----+----+
| 1| Paul| 4| 44| 45|
| 2|George| 5| 52| 53|
| 0| John| 3| 23| 31|
+---+------+----+----+----+

希望这对您有所帮助!

关于python - 如何在 PySpark 中加入/合并具有公共(public)键的数据帧列表?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44516409/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com