hadoop - pig : CONCAT A relation OUTPUT to another RELATION-6ren

hadoop - pig : CONCAT A relation OUTPUT to another RELATION

转载作者：可可西里更新时间：2023-11-01 16:42:37

对于问题的错误措辞，我们深表歉意。我是 stackoverflow 的新手，也是 PIG 的新手，正在尝试自己进行实验。

我有一个处理 words.t 文件和 data.txt 文件的场景。

文字.txt

word1
word2
word3
word4

数据.txt

{"created_at":"18:47:31,Sun Sep 30 2012","text":"RT @Joey7Barton: ..give a word1 about whether the americans wins a Ryder cup. I mean surely he has slightly more important matters. #fami ...","user_id":450990391,"id":252479809098223616}

我需要得到输出

(word1_epochtime){文本属性匹配的完整数据}

即

(word1_1234567890){"created_at":"18:47:31,Sun Sep 30 2012","text":"RT @Joey7Barton: ..give a word1 about whether the americans wins a Ryder cup. I mean surely he has slightly more important matters. #fami ...","user_id":450990391,"id":252479809098223616}

我得到的输出是

(word1){"created_at":"18:47:31,Sun Sep 30 2012","text":"RT @Joey7Barton: ..give a word1 about whether the americans wins a Ryder cup. I mean surely he has slightly more important matters. #fami ...","user_id":450990391,"id":252479809098223616}

通过使用这个脚本。

load words.txt
load data.txt
c = cross words,data;
d = FILTER c BY (data::text MATCHES CONCAT(CONCAT('.*',words::word),'.*'));
e =  foreach (group d BY word) {data);

我得到了带有单词 as 的纪元

time = FOREACH words GENERATE CONCAT(CONCAT(word,'_'),(chararray)ToUnixTime(CurrentTime(created_at));

但我无法随时间拼接单词。

我怎样才能得到输出

(word1_time){data}

请随时为我提出以上建议。谢谢。

最佳答案

我想我得到了输出。这是我编写的脚本。

d = FILTER c BY (data::text MATCHES CONCAT(CONCAT('.*',word::word),'.*'));
e = FOREACH d GENERATE CONCAT(CONCAT(word,'_'),(chararray)ToUnixTime(CurrentTime(created_at))) as epochtime;
f = foreach (group e BY epochtime) {data}
dump f;

关于hadoop - pig : CONCAT A relation OUTPUT to another RELATION，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39356447/