gpt4 book ai didi

scala - Spark SQL : How to append new row to dataframe table (from another table)

转载 作者:行者123 更新时间:2023-12-03 07:16:38 28 4
gpt4 key购买 nike

我正在将 Spark SQL 与数据帧一起使用。我有一个输入数据框,我想将其行附加(或插入)到具有更多列的更大数据框。我该怎么做呢?

如果这是 SQL,我会使用 INSERT INTO OUTPUT SELECT ... FROM INPUT,但我不知道如何使用 Spark SQL 来做到这一点。

具体而言:

var input = sqlContext.createDataFrame(Seq(
(10L, "Joe Doe", 34),
(11L, "Jane Doe", 31),
(12L, "Alice Jones", 25)
)).toDF("id", "name", "age")

var output = sqlContext.createDataFrame(Seq(
(0L, "Jack Smith", 41, "yes", 1459204800L),
(1L, "Jane Jones", 22, "no", 1459294200L),
(2L, "Alice Smith", 31, "", 1459595700L)
)).toDF("id", "name", "age", "init", "ts")


scala> input.show()
+---+-----------+---+
| id| name|age|
+---+-----------+---+
| 10| Joe Doe| 34|
| 11| Jane Doe| 31|
| 12|Alice Jones| 25|
+---+-----------+---+

scala> input.printSchema()
root
|-- id: long (nullable = false)
|-- name: string (nullable = true)
|-- age: integer (nullable = false)


scala> output.show()
+---+-----------+---+----+----------+
| id| name|age|init| ts|
+---+-----------+---+----+----------+
| 0| Jack Smith| 41| yes|1459204800|
| 1| Jane Jones| 22| no|1459294200|
| 2|Alice Smith| 31| |1459595700|
+---+-----------+---+----+----------+

scala> output.printSchema()
root
|-- id: long (nullable = false)
|-- name: string (nullable = true)
|-- age: integer (nullable = false)
|-- init: string (nullable = true)
|-- ts: long (nullable = false)

我想将输入的所有行附加到输出的末尾。同时,我想将 initoutput 列设置为空字符串 ''ts 列是当前时间戳,例如1461883875L。

如有任何帮助,我们将不胜感激。

最佳答案

Spark DataFrames 是不可变的,因此不可能追加/插入行。相反,您可以只添加缺少的列并使用 UNION ALL:

output.unionAll(input.select($"*", lit(""), current_timestamp.cast("long")))

关于scala - Spark SQL : How to append new row to dataframe table (from another table),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36926856/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com