gpt4 book ai didi

SQL on Spark : How do I get all values of DISTINCT?

转载 作者:行者123 更新时间:2023-12-04 13:54:23 25 4
gpt4 key购买 nike

因此,假设我有下表:

Name | Color
------------------------------
John | Blue
Greg | Red
John | Yellow
Greg | Red
Greg | Blue

我想为每个名称获取一张不同颜色的表格-数量和它们的值。意思是这样的:
Name | Distinct | Values
--------------------------------------
John | 2 | Blue, Yellow
Greg | 2 | Red, Blue

有什么想法怎么做?

最佳答案

collect_list将为您提供一个列表,而不会删除重复项。
collect_set将自动删除重复项
所以就

select 
Name,
count(distinct color) as Distinct, # not a very good name
collect_set(Color) as Values
from TblName
group by Name

自spark 1.6.0以来,已实现此功能:

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
/**
* Aggregate function: returns a set of objects with duplicate elements eliminated.
*
* For now this is an alias for the collect_set Hive UDAF.
*
* @group agg_funcs
* @since 1.6.0
*/
def collect_set(columnName: String): Column = collect_set(Column(columnName))

关于SQL on Spark : How do I get all values of DISTINCT?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36117235/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com