gpt4 book ai didi

java - 如何使用 Spark SQL 从外部查询中的子查询访问列

转载 作者:行者123 更新时间:2023-12-02 10:29:44 25 4
gpt4 key购买 nike

在spark SQL中,我执行了查询

select 
Retailer_country,
max(sold)
from (
select
Retailer_country,
count(*) as sold
from ProductInfo
where year=2013 and Product_type='Watches'
group by Retailer_country
)

具有列的数据集

Retailer_country
Order_method_type
Retailer_type
Product_line
Product_type
Product
Year
Quarter
Revenue
Quantity
Gross_margin

返回异常

 org.apache.spark.sql.AnalysisException: grouping expressions sequence is empty, and '__auto_generated_subquery_name.`Retailer_country`' is not an aggregate function. Wrap '(max(__auto_generated_subquery_name.`solds`) AS `max(solds)`)' in windowing function(s) or wrap '__auto_generated_subquery_name.`Retailer_country`' in first() (or first_value) if you don't care which value you get.;;
Aggregate [Retailer_country#10, max(solds#77L) AS max(solds)#80L]
+- SubqueryAlias `__auto_generated_subquery_name`
+- Aggregate [Retailer_country#10], [Retailer_country#10, count(1) AS solds#77L]
+- Filter ((cast(year#16 as int) = 2013) && (Product_type#14 = Watches))
+- SubqueryAlias `productinfo`
+- Relation[Retailer_country#10,Order_method_type#11,Retailer_type#12,Product_line#13,Product_type#14,Product#15,Year#16,Quarter#17,Revenue#18,Quantity#19,Gross_margin#20] csv

当我在具有相同表结构的在线编译器中执行相同的查询时,它返回

美国|4

示例数据:

USA|Fax|Outdoors Shop|Camping Equipment|Cooking Gear|TrailChef Deluxe Cook Set|2012|Q1 2012|59628.66|489|0.34754797
USA|Fax|Outdoors Shop|Camping Equipment|Cooking Gear|TrailChef Deluxe Cook Set|2013|Q1 2012|59628.66|489|0.34754797
USA|Fax|Outdoors Shop|Camping Equipment|watch|TrailChef Deluxe Cook Set|2013|Q1 2012|59628.66|489|0.34754797
USA|Fax|Outdoors Shop|Camping Equipment|watch|TrailChef Deluxe Cook Set|2013|Q1 2012|59628.66|489|0.34754797
USA|Fax|Outdoors Shop|Camping Equipment|watch|TrailChef Deluxe Cook Set|2013|Q1 2012|59628.66|489|0.34754797
UK|Fax|Outdoors Shop|Camping Equipment|watch|TrailChef Deluxe Cook Set|2013|Q1 2012|59628.66|489|0.34754797
UK|Fax|Outdoors Shop|Camping Equipment|watch|TrailChef Deluxe Cook Set|2013|Q1 2012|59628.66|489|0.34754797
USA|Fax|Outdoors Shop|Camping Equipment|watch|TrailChef Deluxe Cook Set|2013|Q1 2012|59628.66|489|0.34754797
UK|Fax|Outdoors Shop|Camping Equipment|Cooking Gear|TrailChef Deluxe Cook Set|2013|Q1 2012|59628.66|489|0.34754797
UK|Fax|Outdoors Shop|Camping Equipment|Cooking Gear|TrailChef Deluxe Cook Set|2012|Q1 2012|59628.66|489|0.34754797

结果有何不同以及如何在 Spark 中显示 Retailer_country。任何帮助解决这个问题。

最佳答案

你有两个聚合。一个用于子查询,另一个用于主查询。您的查询格式必须如下所示

select 
Retailer_country,
max(sold)
from (
select
Retailer_country,
count(*) as sold
from ProductInfo
where year=2013 and Product_type='Watches'
group by Retailer_country)
group by Retailer_country

但是,如果我们更深入地查看您的查询,您的子查询会因聚合而返回不同的 Retailer_country。因此,您不需要使用外部 max(sold) 聚合。结果,您的最终查询实际上是这样的:

select 
Retailer_country,
count(*) as sold
from ProductInfo
where year=2013 and Product_type='Watches'
Group by Retailer_country

编辑:根据提供的评论

select 
Retailer_country,
count(*) as sold
from ProductInfo
where year=2013 and Product_type='Watches'
group by Retailer_country
Order by sold desc
limit 1

关于java - 如何使用 Spark SQL 从外部查询中的子查询访问列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53681500/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com