gpt4 book ai didi

scala - 在spark.sql中使用group by选择多个元素

转载 作者:行者123 更新时间:2023-12-02 04:02:01 24 4
gpt4 key购买 nike

有没有办法在sql Spark中按表分组来选择多个元素我正在使用的代码:

val df = spark.read.json("//path")
df.createOrReplaceTempView("GETBYID")

现在正在做分组:

val sqlDF = spark.sql(
"SELECT count(customerId) FROM GETBYID group by customerId");

但是当我尝试时:

val sqlDF = spark.sql(
"SELECT count(customerId),customerId,userId FROM GETBYID group by customerId");

Spark 给出错误:

org.apache.spark.sql.AnalysisException: expression 'getbyid.userId' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;

有什么办法可以做到这一点

最佳答案

是的,有可能,您所附的错误消息描述了所有可能性。您可以将 userId 添加到 groupBy:

val sqlDF = spark.sql("SELECT count(customerId),customerId,userId FROM GETBYID group by customerId, userId");

或使用first():

val sqlDF = spark.sql("SELECT count(customerId),customerId,first(userId) FROM GETBYID group by customerId");

关于scala - 在spark.sql中使用group by选择多个元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41421675/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com