gpt4 book ai didi

scala - Spark SQL : Select with arithmetic on column values and type casting?

转载 作者:行者123 更新时间:2023-12-01 11:14:53 24 4
gpt4 key购买 nike

我将 Spark SQL 与 DataFrames 一起使用。有没有办法用一些算术来做一个选择语句,just as you can in SQL ?

例如,我有下表:

var data = Array((1, "foo", 30, 5), (2, "bar", 35, 3), (3, "foo", 25, 4))
var dataDf = sc.parallelize(data).toDF("id", "name", "value", "years")

dataDf.printSchema
// root
// |-- id: integer (nullable = false)
// |-- name: string (nullable = true)
// |-- value: integer (nullable = false)
// |-- years: integer (nullable = false)

dataDf.show()
// +---+----+-----+-----+
// | id|name|value|years|
// +---+----+-----+-----+
// | 1| foo| 30| 5|
// | 2| bar| 35| 3|
// | 3| foo| 25| 4|
//+---+----+-----+-----+

现在,我想做一个 SELECT 语句,该语句创建一个新列,并对现有列执行一些算术运算。例如,我想计算比率 value/years .我需要先将值(或年)转换为 double 值。我试过这个语句,但它不会解析:
dataDf.
select(dataDf("name"), (dataDf("value").toDouble/dataDf("years")).as("ratio")).
show()

<console>:35: error: value toDouble is not a member of org.apache.spark.sql.Column
select(dataDf("name"), (dataDf("value").toDouble/dataDf("years")).as("ratio")).

我在“ How to change column types in Spark SQL's DataFrame?”中看到了一个类似的问题,但这并不是我想要的。

最佳答案

更改 Column 类型的正确方法是使用 cast方法。它可以采用描述字符串:

dataDf("value").cast("double") / dataDf("years")

DataType :
import org.apache.spark.sql.types.DoubleType

dataDf("value").cast(DoubleType) / dataDf("years")

关于scala - Spark SQL : Select with arithmetic on column values and type casting?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35712175/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com