gpt4 book ai didi

apache-spark - Spark 数据帧 : explode list column

转载 作者:行者123 更新时间:2023-12-01 10:21:25 26 4
gpt4 key购买 nike

我有一个来自 Spark Aggregator 的输出,它是 List[Character]

case class Character(name: String, secondName: String, faculty: String)
val charColumn = HPAggregator.toColumn
val resultDF = someDF.select(charColumn)

所以我的数据框看起来像:
+-----------------------------------------------+
| value |
+-----------------------------------------------+
|[[harry, potter, gryffindor],[ron, weasley ... |
+-----------------------------------------------+

现在我想把它转换成
+----------------------------------+
| name | second_name | faculty |
+----------------------------------+
| harry | potter | gryffindor |
| ron | weasley | gryffindor |

我怎样才能正确地做到这一点?

最佳答案

这可以使用 Explode 和 Split Dataframe 函数来完成。

下面是一个例子:

>>> df = spark.createDataFrame([[[['a','b','c'], ['d','e','f'], ['g','h','i']]]],["col1"])
>>> df.show(20, False)
+---------------------------------------------------------------------+
|col1 |
+---------------------------------------------------------------------+
|[WrappedArray(a, b, c), WrappedArray(d, e, f), WrappedArray(g, h, i)]|
+---------------------------------------------------------------------+

>>> from pyspark.sql.functions import explode
>>> out_df = df.withColumn("col2", explode(df.col1)).drop('col1')
>>>
>>> out_df .show()
+---------+
| col2|
+---------+
|[a, b, c]|
|[d, e, f]|
|[g, h, i]|
+---------+

>>> out_df.select(out_df.col2[0].alias('c1'), out_df.col2[1].alias('c2'), out_df.col2[2].alias('c3')).show()
+---+---+---+
| c1| c2| c3|
+---+---+---+
| a| b| c|
| d| e| f|
| g| h| i|
+---+---+---+

>>>

关于apache-spark - Spark 数据帧 : explode list column,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51609740/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com