gpt4 book ai didi

apache-pig - pig : Get all tuples out of a grouped bag

转载 作者:行者123 更新时间:2023-12-04 16:26:17 25 4
gpt4 key购买 nike

我正在使用 PIG 从元组生成组,如下所示:

a1, b1
a1, b2
a1, b3
...

->

a1, [b1, b2, b3]
...

这很容易且有效。但我的问题是得到以下内容:从获得的组中,我想在组的包中生成一组所有元组:
a1, [b1, b2, b3]

->

b1,b2
b1,b3
b2,b3

如果我可以嵌套“foreach”并首先遍历每个组然后遍历它的包,这将很容易。

我想我误解了这个概念,我会很感激你的解释。

谢谢。

最佳答案

看起来您需要在包和包之间使用笛卡尔积。为此,您需要使用 FLATTEN(bag) 两次。

代码:

inpt = load '.../group.txt' using PigStorage(',') as (id, val);
grp = group inpt by (id);
id_grp = foreach grp generate group as id, inpt.val as value_bag;
result = foreach id_grp generate id, FLATTEN(value_bag) as v1, FLATTEN(value_bag) as v2;
dump result;

请注意,大袋子会产生很多行。为了避免它,您可以在 FLATTEN 之前使用 TOP(...):
inpt = load '....group.txt' using PigStorage(',')  as (id, val);
grp = group inpt by (id);
id_grp = foreach grp generate group as id, inpt.val as values;
result = foreach id_grp {
limited_bag = TOP(50, 0, values); -- all sorts of filtering could be done here
generate id, FLATTEN(limited_bag) as v1, FLATTEN(limited_bag) as v2;
};
dump result;

对于您的特定输出,您可以在 FLATTEN 之前使用一些过滤:
inpt = load '..../group.txt' as (id, val);
grp = group inpt by (id);
id_grp = foreach grp generate group as id, inpt.val as values;
result = foreach id_grp {
l = filter values by val == 'b1' or val == 'b2';
generate id, FLATTEN(l) as v1, FLATTEN(values) as v2;
};
result = filter result by v1 != v2;

我希望它有帮助。

干杯

关于apache-pig - pig : Get all tuples out of a grouped bag,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11308050/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com