gpt4 book ai didi

hadoop - PIG 如何将行数从 1 个别名返回到另一个别名

转载 作者:可可西里 更新时间:2023-11-01 15:50:06 24 4
gpt4 key购买 nike

REGISTER 'udf.py' using jython as myfunc;
loadhtml = load './assignment/crawler' using PigStorage('\u0001') as (id1:chararray,url:chararray,domain:chararray,content:chararray,source:chararray,date:chararray);
loadhtml_content = FOREACH loadhtml generate content;
flatten = FOREACH loadhtml_content generate flatten(TOKENIZE(line)) as word;
group = GROUP flatten by word;
count = FOREACH group1 generate $0, COUNT($1);
log = FOREACH count GENERATE myfunc.nLog($0,$1,**<I need to return the row count of loadhtml_content here>**);

我正在尝试将 loadhtml_content 的行数返回到另一个别名中。我想不出其他办法。

log = FOREACH count GENERATE myfunc.nLog($0,$1,(这里需要返回loadhtml_content的行数));

最佳答案

我相信这正是您正在寻找的功能:https://issues.apache.org/jira/browse/PIG-1434 .它本质上允许我们在任何需要的地方使用单元组关系作为常量。以下内容应该可以解决您的问题:

loadhtml_content = FOREACH loadhtml generate content;
content_rows = FOREACH (GROUP loadhtml_content ALL) GENERATE
COUNT(loadhtml_content);
log = FOREACH count GENERATE myfunc.nLog($0,$1,content_rows.$0);

关于hadoop - PIG 如何将行数从 1 个别名返回到另一个别名,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50635523/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com