gpt4 book ai didi

dataframe - 派斯帕克 : Subtracting/Difference pyspark dataframes based on all columns

转载 作者:行者123 更新时间:2023-12-05 01:33:36 26 4
gpt4 key购买 nike

我有两个 pyspark 数据框,如下所示 -

df1

id     city      country       region    continent
1 chicago USA NA NA
2 houston USA NA NA
3 Sydney Australia AU AU
4 London UK EU EU

df2

id     city      country       region    continent
1 chicago USA NA NA
2 houston USA NA NA
3 Paris France EU EU
5 London UK EU EU

我想根据所有列值找出 df2 中存在但 df1 中不存在的行。所以 df2 - df1 应该会产生如下所示的 df_result

df_结果

id     city      country       region    continent
3 Paris France EU EU
5 London UK EU EU

我怎样才能在pyspark中实现它。提前致谢

最佳答案

您可以使用 left_anti 连接:

df2.join(df1, on = ["id", "city", "country"], how = "left_anti").show()

+---+------+-------+------+---------+
| id| city|country|region|continent|
+---+------+-------+------+---------+
| 3| Paris| France| EU| EU|
| 5|London| UK| EU| EU|
+---+------+-------+------+---------+

如果所有列都有非空值:

df2.join(df1, on = df2.schema.names, how = "left_anti").show()

关于dataframe - 派斯帕克 : Subtracting/Difference pyspark dataframes based on all columns,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64687700/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com