我正在使用 Pig 0.12 版本,我想使用引用生成动态 IN 条件。
在我的 pig 文件中,我有“m_master”关系,当我说DESCRIBE m_master
时,它给了我以下内容
m_master: {m_id: chararray,m_name: chararray,in_dx: chararray,rolled_up_name: chararray,match_code: chararray,match0: chararray,flag_ind: chararray}
现在我想执行一些操作,例如
UPDATE M_Master SET flag_ind='SE' WHERE Rolled_Up_Name IN (SELECT DISTINCT Rolled_Up_Name FROM M_Master WHERE flag_ind='SE') AND flag_ind='Non SE'
相当于 RDBMS 查询。
我已经从 m_master 生成了不同的 roll_up_names,被称为distinct_rollup_names
m_master = FOREACH m_master GENERATE m_id, m_name, in_dx, rolled_up_name, match_code, match0,
(
(
flag_ind='Non SE' AND rolled_up_name IN (<b>distinct_rollup_names</b>)
) ? 'SE' : flag_ind
) as flag_ind;
如何在IN条件下使用生成的关系值,请提出任何建议
Pig 不支持您所期望的 IN 子句。在rolled_up_name列上自加入m_master,然后将左侧flag_ind更新为SE(如果其非SE)并且如果右侧flag_ind是SE
--Original m_master
m_master: {m_id: chararray,m_name: chararray,in_dx: chararray,rolled_up_name: chararray,match_code: chararray,match0: chararray,flag_ind: chararray}
-- Clone m_master into m_master2
m_master2 = FOREACH m_master GENERATE m_id, m_name, in_dx, rolled_up_name, match_code, match0, flag_ind;
-- We are interested only in SE flag_ind (this works as inner query in your question)
m_master2 = filter m_master2 by flag_ind == 'SE';
-- Now join m_master and m_master2
m_master_self_joined = JOIN m_master BY rolled_up_name LEFT OUTER, m_master2 BY rolled_up_name;
-- Now pick fields from m_master
-- When there is a match with m_master2, set flag_ind to SE
m_master_self_joined2 = FOREACH m_master_self_joined
GENERATE
m_master::m_id,
m_master::m_name,
m_master::in_dx,
m_master::rolled_up_name,
m_master::match_code,
m_master::match0,
(m_master::m_id == null ? 'Non SE' : 'SE');
-- Its possible to have duplicates (if rolled_up_name is not unique), so take uniques
m_master_self_joined3 = DISTINCT m_master_self_joined2;
希望这有帮助
我是一名优秀的程序员,十分优秀!