gpt4 book ai didi

hadoop - 如何在Pig的组函数中使用Bincode运算符

转载 作者:行者123 更新时间:2023-12-02 21:10:14 25 4
gpt4 key购买 nike

我需要将以下有关fname和lastname的数据分组。

(fname,lname,id)

abc,xyz,I
abc,xyz,N
ppp,xxx,I
ppp,XXX,I

在id字段中,我只期望2个值,即N或I,所以如果我同时获得N和I代表相同的fname,lname组合,则应该使用id作为N,否则需要为id字段使用值,因为它在组中给出。

我期望以下结果:
abc,xyz,N
ppp,xxx,I

我试过下面的代码,并且工作正常
in =load '/testing/name.txt' USING PigStorage(',') as (fname:chararray,lname:chararray,id:chararray);

grp = group in by (fname,lname);

z = foreach grp generate FLATTEN(group) AS (fname,lname),(COUNT(in.id) >1 ? ('N') :BagToTuple(in.id))as id;

但是现在我需要检查id字段的值而不是计数:
z = foreach grp generate FLATTEN(group) AS (fname,lname),((in.id == 'N' or in.id == 'I') ? ('N') :BagToTuple(in.id))as id;

但是它给下面的错误:
(Name: Equal Type: null Uid: null)incompatible types in Equal Operator left hand side:bag :tuple(id:chararray)  right hand side:chararray

但是它给下面的错误:
Two inputs of BinCond must have compatible schemas. left hand side: #31:tuple(#32:chararray) right hand side: org.apache.pig.builtin.bagtotuple_3#35:tuple(id#36:int)

请指导

最佳答案

您正在将包含字符即N,I的字段加载到int列中吗?更改id列类型为chararray的load语句。

in =load '/testing/name.txt' USING PigStorage(',') as (fname:chararray,lname:chararray,id:chararray);
grp = group in by (fname,lname);
z = foreach grp generate FLATTEN(group) AS (fname,lname),(COUNT(in.id) > 1 && in.id matches 'N') ? ('N') : in.id;

关于hadoop - 如何在Pig的组函数中使用Bincode运算符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40483048/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com