gpt4 book ai didi

apache-spark - Spark DataFrame 从子查询添加列

转载 作者:行者123 更新时间:2023-12-04 09:34:43 26 4
gpt4 key购买 nike

使用 SQL 语法,我可以使用这样的子查询添加新列:

import spark.sqlContext.implicits._

List(
("a", "1", "2"),
("b", "1", "3"),
("c", "1", "4"),
("d", "1", "5")
).toDF("name", "start", "end")
.createOrReplaceTempView("base")

List(
("a", "1", "2"),
("b", "2", "3"),
("c", "3", "4"),
("d", "4", "5"),
("f", "5", "6")
).toDF("name", "number", "_count")
.createOrReplaceTempView("col")


spark.sql(
"""
|select a.name,
| (select Max(_count) from col b where b.number == a.end) - (select Max(_count) from col b where b.number == a.start) as result
|from base a
|""".stripMargin)
.show(false)
我如何使用 DataFrame API 做到这一点?

最佳答案

我找到了语法:

import spark.sqlContext.implicits._

val b = List(
("a", "1", "2"),
("b", "1", "3"),
("c", "1", "4"),
("d", "1", "5")
).toDF("name", "start", "end")

List(
("a", "1", "2"),
("b", "2", "3"),
("c", "3", "4"),
("d", "4", "5"),
("f", "5", "6")
).toDF("name", "number", "_count")
.createOrReplaceTempView("ref_table")


b.withColumn("result", expr("((select max(_count) from ref_table r where r.number = end) - (select max(_count) from ref_table r where r.number = start)) as result")).show(false)

关于apache-spark - Spark DataFrame 从子查询添加列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62642965/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com