gpt4 book ai didi

sql - Apache Pig 中是否有等同于多个 COUNT(DISTINCT CASE WHEN ...) 语句的语句?

转载 作者:可可西里 更新时间:2023-11-01 14:22:34 29 4
gpt4 key购买 nike

我是 Apache Pig 的新手,正在尝试学习。 Apache Pig 中是否有等效于 SQL 的 COUNT(DISTINCT CASE WHEN ...)

例如,我正在尝试做这样的事情:

CREATE TABLE email_profile AS
SELECT user_id
, COUNT(DISTINCT CASE WHEN email_code = 'C' THEN message_id ELSE NULL END) AS clickthroughs
, COUNT(DISTINCT CASE WHEN email_code = 'O' THEN message_id ELSE NULL END) AS opened_messages
, COUNT(DISTINCT message_id) AS total_messages_received
FROM email_campaigns
GROUP BY user_id;

我不能使用 FILTER email_campaigns BY email_code = 'C',因为那样会减少其他情况。有没有办法在一个嵌套的 FOREACH block 中完成这一切?

谢谢!

编辑:

根据要求,示例数据。字段是 used_idemail_codemessage_id

user1@example.com    O     111
user1@example.com C 111
user2@example.com O 111
user1@example.com O 222
user2@example.com O 333

预期输出:

user1@example.com    2    1    2
user2@example.com 2 0 2

最佳答案

您可以在 nested FOREACH 中进行过滤在你之后GROUPused_id 上。有关详细信息,请参阅我的代码中的注释。

类似于:

-- Firstly we group so the FOREACH is applied per used_id
A = GROUP email_campaigns BY used_id ;
B = FOREACH A {
-- We need these three lines to accomplish the:
-- DISTINCT CASE WHEN email_code = 'C' THEN message_id ELSE NULL END
-- First, we get only cases where email_code == 'C'
click_filt = FILTER email_campaigns BY email_code == 'C' ;
-- Since we only want unique message_ids, we need to project it out
click_proj = FOREACH click_filt GENERATE message_id ;
-- Now we can find all unique message_ids for a given filter
click_dist = DISTINCT click_proj ;

opened_filt = FILTER email_campaigns BY email_code == 'O' ;
opened_proj = FOREACH opened_filt GENERATE message_id ;
opened_dist = DISTINCT opened_proj ;

total_proj = FOREACH email_campaigns GENERATE message_id ;
total_dist = DISTINCT total_proj ;
GENERATE group AS used_id, COUNT(click_dist) AS clickthroughs,
COUNT(opened_dist) AS opened_messages,
COUNT(total_dist) AS total_messages_received ;
}

B 的输出应该是:

(user1@example.com,1,2,2)
(user2@example.com,0,2,2)

如果您需要任何关于正在发生的事情的额外说明,请告诉我。

关于sql - Apache Pig 中是否有等同于多个 COUNT(DISTINCT CASE WHEN ...) 语句的语句?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19432597/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com