gpt4 book ai didi

sql - 多列和多行的 Hive/SQL 计数出现次数

转载 作者:可可西里 更新时间:2023-11-01 14:46:30 25 4
gpt4 key购买 nike

我正在寻找一种计算出现次数的智能方法。

这是一个例子:

 UserID     CityID    CountryID   TagID
100000 1 30 5
100001 1 30 6
100000 2 20 7
100000 2 40 8
100001 1 40 6
100002 1 40 5
100002 1 20 6

我想做什么:

我想按列计算每个用户值的出现次数。最后,我想要一个表格来显示有多少用户具有不同的特征。

结果应该看起来像这样 - 或多或少

Different_CityID    Different_CountryIDs   Different_TagIDs
1 3 2

解释:

  • Different_CityIDs:仅 UserID 100000 具有不同的 CityID
  • Different_CountryIDs:所有用户在其国家/地区都有不同的 ID
  • Different_TagIDs:UserID 100000 和 100002 都有不同的 TagID。用户 100001 只有“6”作为 TagID。

我为列的 COUNT 和 GROUP BY 苦苦挣扎,但最终没有成功。有智能解决方案吗?

非常感谢

最佳答案

select  count(case when pos=0 and count_distinct_ID>1 then 1 end) as different_cityid
,count(case when pos=1 and count_distinct_ID>1 then 1 end) as different_countryid
,count(case when pos=2 and count_distinct_ID>1 then 1 end) as different_tagid

from (select pe.pos
,count (distinct pe.ID) as count_distinct_ID
from mytable t
lateral view posexplode (array(CityID,CountryID,TagID)) pe as pos,ID

group by t.UserID
,pe.pos
) t
;

+------------------+---------------------+-----------------+
| different_cityid | different_countryid | different_tagid |
+------------------+---------------------+-----------------+
| 1 | 3 | 2 |
+------------------+---------------------+-----------------+

这是避免 count(distinct ...)

的另一种变体
select  count (case when pos=0 and not is_distinct_ID then 1 end)  as different_cityid
,count (case when pos=1 and not is_distinct_ID then 1 end) as different_countryid
,count (case when pos=2 and not is_distinct_ID then 1 end) as different_tagid

from (select pe.pos
,min(pe.ID)<=>max(pe.ID) as is_distinct_ID
from mytable t
lateral view posexplode (array(CityID,CountryID,TagID)) pe as pos,ID

group by t.UserID
,pe.pos
) t
;

...和另一种变体

select  count (case when not is_distinct_CityID    then 1 end)   as different_cityid
,count (case when not is_distinct_CountryID then 1 end) as different_countryid
,count (case when not is_distinct_TagID then 1 end) as different_tagid

from (select min (CityID) <=> max (CityID) as is_distinct_CityID
,min (CountryID) <=> max (CountryID) as is_distinct_CountryID
,min (TagID) <=> max (TagID) as is_distinct_TagID

from mytable

group by UserID
) t
;

关于sql - 多列和多行的 Hive/SQL 计数出现次数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43628806/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com