gpt4 book ai didi

hadoop - pig 自定义函数加载多个字符 ^^(双胡萝卜)定界符

转载 作者:可可西里 更新时间:2023-11-01 16:32:02 30 4
gpt4 key购买 nike

我是 PIG 的新手,有人可以帮助我如何加载一个包含多个字符(在我的例子中为“^^”)作为列分隔符的文件。

例如我有以下列的文件aisforapple^^bisforball^^cisforcat^^disfordoll^^andeisforelephantfisforfish^^gisforgreen^^hisforhat^^iisforicecreem^^andjisforjarkisforking^^lisforlion^^misformango^^nisfornose^^andoisfororange

问候

最佳答案

正则表达式最适合这种多字符

input.txt
aisforapple^^bisforball^^cisforcat^^disfordoll^^andeisforelephant
fisforfish^^gisforgreen^^hisforhat^^iisforicecreem^^andjisforjar
kisforking^^lisforlion^^misformango^^nisfornose^^andoisfororange

PigScript
A = LOAD 'input.txt' AS line;
B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line,'(.*)\\^\\^(.*)\\^\\^(.*)\\^\\^(.*)\\^\\^(.*)')) AS (f1,f2,f3,f4,f5);
DUMP B;

Output:
(aisforapple,bisforball,cisforcat,disfordoll,andeisforelephant)
(fisforfish,gisforgreen,hisforhat,iisforicecreem,andjisforjar)
(kisforking,lisforlion,misformango,nisfornose,andoisfororange)

解释:

For better understanding i break the regex into multiple lines
(.*)\\^\\^ ->Any character match till ^^ and stored into f1,(double backslash for special characters)
(.*)\\^\\^ ->Any character match till ^^ and stored into f2,(double backslash for special characters)
(.*)\\^\\^ ->Any character match till ^^ and stored into f3,(double backslash for special characters)
(.*)\\^\\^ ->Any character match till ^^ and stored into f4,(double backslash for special characters)
(.*) ->Any character match till the end of string and stored into f5

关于hadoop - pig 自定义函数加载多个字符 ^^(双胡萝卜)定界符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26535051/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com