gpt4 book ai didi

hadoop - Apache Pig - 如何提取记录集

转载 作者:可可西里 更新时间:2023-11-01 16:11:37 24 4
gpt4 key购买 nike

我是 Apache Pig 的新用户,我有以下数据

order=0012,1,23  
order=0013,2,34,0015,1,45
order=0011,1,456
...

我试图提取到以下记录

0012,1,23
0013,2,34
0015,1,45
0011,1,456
...

下面是我试过的代码

a = LOAD 'a.txt' Using TextLoader() AS (line:chararray);  
b = FOREACH a GENERATE FLATTEN(REGEX_EXTRACT_ALL(line, 'order=((\\d+),(\\d+),(\\d+))+')) AS
(
order_item:chararray,
order_pid: chararray,
order_qty: chararray,
order_price: chararray
);

这行不通。
另一个尝试保存到 Bag 中:

a = LOAD 'a.txt' Using TextLoader() AS (line:chararray);  
b = FOREACH a GENERATE FLATTEN(REGEX_EXTRACT_ALL(line, 'order=((\\d+),(\\d+),(\\d+))+')) AS
(
B: bag { T: tuple(
order_pid: chararray,
order_qty: chararray,
order_price: char array
)}
);

还是不行。

最佳答案

你能试试这个吗?

输入:

order=0012,1,23
order=0013,2,34,0015,1,45
order=0011,1,456

PigScript:

A = LOAD 'input' AS (line:chararray);
B = FOREACH A GENERATE FLATTEN(STRSPLIT(REGEX_EXTRACT(line,'order=(.*)',1),','));
C = FOREACH B GENERATE FLATTEN(TOBAG(TOTUPLE($0..$2),TOTUPLE($3..$5)));
D = FILTER C BY $0 is not null;
DUMP D;

输出:

(0012,1,23)
(0013,2,34)
(0015,1,45)
(0011,1,456)

关于hadoop - Apache Pig - 如何提取记录集,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29769339/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com