gpt4 book ai didi

python - pig udf中python代码的正确输入/输出?

转载 作者:行者123 更新时间:2023-12-02 21:55:01 27 4
gpt4 key购买 nike

我有这个简短的python脚本:

import langid
import sys

for pig_tuple in sys.stdin:
cols = pig_tuple.split()

if len(cols) < 2:
sys.exit(0)

try:
id = int(cols[0])
text = " ".join(cols[1:])
except:
sys.exit(0)

(lang,prob) = langid.classify(text)
print "%s\t%s" %(id,lang)

sys.exit(0)

我想在 pig 脚本中运行它。我试过了:
define langid_cmd `python2.6 /data/test/compiled_python/langid_command_line.py` ship('/data/test/compiled_python/langid_command_line.py');

text = LOAD '$PIG_INPUT' USING PigStorage() as (text:chararray);

pythonDetect1 = STREAM text through langid_cmd AS (pid:chararray,planguage:chararray);

但我得到:
2013-03-29 15:53:22,290 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2013-03-29 15:53:22,303 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2999: Unexpected internal error. java.lang.String cannot be cast to org.apache.pig.data.Tuple
Details at logfile: /home/isl/ryan/src/main/pigScripts/pig_1364597410350.log
2013-03-29 15:53:22,306 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2999: Unexpected internal error. java.lang.String cannot be cast to org.apache.pig.data.Tuple
Details at logfile: /home/isl/ryan/src/main/pigScripts//src/main/pigScripts/pig_1364597410350.log
2013-03-29 15:53:22,308 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2999: Unexpected internal error. java.lang.String cannot be cast to org.apache.pig.data.Tuple
Details at logfile: /home/isl/ryan/src/main/pigScripts//src/main/pigScripts/pig_1364597410350.log
2013-03-29 15:53:22,311 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2999: Unexpected internal error. java.lang.String cannot be cast to org.apache.pig.data.Tuple
Details at logfile: /home/isl/ryan/src/main/pigScripts/src/main/pigScripts/pig_1364597410350.log
2013-03-29 15:53:22,313 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2999: Unexpected internal error. java.lang.String cannot be cast to org.apache.pig.data.Tuple
Details at logfile: /home/isl/ryan/src/main/pigScripts/src/main/pigScripts/pig_1364597410350.log

将目录/ data / test / compiled_python chmod设置为777,当我从shell运行该目录时:
-bash-3.2$ echo 14353 I can haz pigscriptz? | python /data/test/compiled_python/langid_command_line.py 
14353 eu

??

最佳答案

AS (pid:chararray,planguage:chararray)告诉Pig期望输出的是字符串元组,但是您返回制表符分隔的字符串。您应该返回打印结果为

print "(%s,%s)" %(id,lang)

or use the python UDF integration

关于python - pig udf中python代码的正确输入/输出?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15712761/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com