gpt4 book ai didi

hadoop - pig 皮过滤内袋

转载 作者:行者123 更新时间:2023-12-02 20:55:48 25 4
gpt4 key购买 nike

数据如下所示:

22678, {(112),(110),(2)}      
656565, {(110), (109)}
6676, {(2),(112)}

这是数据结构:
(id:chararray, event_list:{innertuple:(innerfield:chararray)})

我想过滤那些 event_list包含 2的行。我本来以为先将数据展平,然后过滤具有2的行。以某种方式展平在此数据集上不起作用。

谁能帮忙吗?

最佳答案

可能有一个更简单的方法来执行此操作,例如查找行李等。否则,使用基本的 pig 来实现此目的的一种方法是:

data = load 'data.txt'  AS (id:chararray, event_list:bag{});

-- flatten bag, in order to transpose each element to a separate row.
flattened = foreach data generate id, flatten(event_list);

-- keep only those rows where the value is 2.
filtered = filter flattened by (int) $1 == 2;

-- keep only distinct ids.
dist = distinct (foreach filtered generate $0 as (id:chararray));

-- join distinct ids to origitnal relation
jnd = join a by id, dist by id;

-- remove extra fields, keep original fields.
result = foreach jnd generate a::id, a::event_list;
dump result;

(22678,{(112),(110),(2)})
(6676,{(2),(112)})

关于hadoop - pig 皮过滤内袋,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44578486/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com