gpt4 book ai didi

scala - 联合不会删除 Spark 数据框中的重复行

转载 作者:行者123 更新时间:2023-12-04 00:30:14 25 4
gpt4 key购买 nike

我有两个如下所示的数据框

+--------------------+--------+-----------+-------------+
|UniqueFundamentalSet|Taxonomy|FFAction|!||DataPartition|
+--------------------+--------+-----------+-------------+
|192730241374 |1 |I|!| |Japan |
|192730241374 |2 |I|!| |Japan |
|192730241373 |1 |I|!| |Japan |
|192730241373 |2 |I|!| |Japan |
+--------------------+--------+-----------+-------------+

+--------------------+--------+-----------+-------------+
|UniqueFundamentalSet|Taxonomy|FFAction|!||DataPartition|
+--------------------+--------+-----------+-------------+
|192730241374 |1 |I|!| |Japan |
|192730241374 |2 |I|!| |Japan |
|192730391384 |1 |I|!| |Japan |
|192730391384 |2 |I|!| |Japan |
|192730241373 |1 |I|!| |Japan |
|192730241373 |2 |I|!| |Japan |
+--------------------+--------+-----------+-------------+

当我在上述数据框之间执行联合时,我得到重复的行。
这是我的输出
+--------------------+--------+-----------+-------------+
|UniqueFundamentalSet|Taxonomy|FFAction|!||DataPartition|
+--------------------+--------+-----------+-------------+
|192730241374 |1 |I|!| |Japan |
|192730241374 |2 |I|!| |Japan |
|192730241373 |1 |I|!| |Japan |
|192730241373 |2 |I|!| |Japan |
|192730241374 |1 |I|!| |Japan |
|192730241374 |2 |I|!| |Japan |
|192730391384 |1 |I|!| |Japan |
|192730391384 |2 |I|!| |Japan |
|192730241373 |1 |I|!| |Japan |
|192730241373 |2 |I|!| |Japan |
+--------------------+--------+-----------+-------------+

val dfToSave = dfMainOutput.union(insertdf)

我的印象是 union 删除了重复的行,而 unionall 保留了它。
我必须在 union 之后使用 distinct 。
有人可以解释一下。

最佳答案

你的印象是错误的。如 the official documentation 中所述:

Returns a new Dataset containing union of rows in this Dataset and another Dataset>.

This is equivalent to UNION ALL in SQL. To do a SQL-style set union (that does deduplication of elements), use this function followed by a distinct.

Also as standard in SQL, this function resolves columns by position (not by name):

关于scala - 联合不会删除 Spark 数据框中的重复行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52494653/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com