gpt4 book ai didi

hadoop - pig 错误 0 : Scalar has more than one row in the output

转载 作者:可可西里 更新时间:2023-11-01 15:51:22 35 4
gpt4 key购买 nike

我有两个文件,我试图在模式匹配的基础上加入这两个文件。

File1 :

weather.bbc.co.uk,112
ads.facebook.com,113
ads.amazon.co.uk,114
www.sky.com,115
news.bbc.co.uk,116
pics.facebook.com,117

File2 :

facebook.com,facebook
bbc.co.uk,bbc
netflix.com,netflix
flipkart.com,flipkart

output:

weather.bbc.co.uk,112,bbc.co.uk,bbc
ads.facebook.com,113,facebook.com,facebook
news.bbc.co.uk,116,bbc.co.uk,bbc
pics.facebook.com,117,facebook.com,facebook

Script

file1 = LOAD '/file1' using PigStorage('|') as (request_domain: chararray,msisdn:int);
file2 = LOAD '/file2' using PigStorage('|') as (domain: chararray,provider: chararray);
file3 = JOIN file1 by case when (request_domain MATCHES CONCAT(CONCAT('(?i).*',file2.domain),'.*')) then file2.domain else 'Other' end LEFT OUTER,file2 by domain;
DESCRIBE file3;
dump file3;

但是我收到如下错误:

WARN [Thread-29] org.apache.hadoop.mapred.LocalJobRunner - job_local_0006 org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has more than one row in the output. 1st : (facebook.com,facebook), 2nd :(bbc.co.uk,bbc) at org.apache.pig.impl.builtin.ReadScalars.exec(ReadScalars.java:111) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:330) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextString(POUserFunc.java:432) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:317) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:221) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:275) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextString(POUserFunc.java:432) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:317) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:221) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:275) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextString(POUserFunc.java:432)

最佳答案

分隔符应该是“,”而不是“|” -> PigStorage(',')

该模式将匹配多个值,尝试使用带有 indexof udf 的交叉函数,如下所示

file1 = LOAD 'data/file1.txt' using PigStorage(',') as (request_domain: chararray,msisdn:int);       
file2 = LOAD 'data/file2.txt' using PigStorage(',') as (domain: chararray,provider: chararray);
crossed = CROSS file1,file2;
filtered = FILTER crossed BY INDEXOF(file1::request_domain,file2::domain) != -1 ;

关于hadoop - pig 错误 0 : Scalar has more than one row in the output,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49008684/

35 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com