gpt4 book ai didi

python - 使用Python在Hive中导入数据时出错

转载 作者:行者123 更新时间:2023-12-02 21:19:15 26 4
gpt4 key购买 nike

我正在学习使用python将数据导入hadoop上的Hive中,这是python代码:

import sys
import datetime

for line in sys.stdin:
line = line.strip()
userid, movieid, rating, unixtime = line.split('\t')
weekday = datetime.datetime.fromtimestamp(float(unixtime)).isoweekday()
print '\t'.join([userid, movieid, rating, str(weekday)])

这是Mapper脚本:
CREATE TABLE u_data_new (
userid INT,
movieid INT,
rating INT,
weekday INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t';
add FILE weekday_mapper.py;
INSERT OVERWRITE TABLE u_data_new
SELECT
TRANSFORM (userid, movieid, rating, unixtime)
USING 'python weekday_mapper.py'
AS (userid, movieid, rating, weekday)
FROM u_data;

以下是我收到的错误消息:
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"userid":222,"movieid":298,"rating":4,"unixtime":"877563253"}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:168)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"userid":222,"movieid":298,"rating":4,"unixtime":"877563253"}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:574)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:159)

在上述错误消息之前,我具有以下输出,在我看来, map 作业已完成并成功完成:
2016-06-17 13:56:34,782 Stage-1 map = 0%,  reduce = 0%
2016-06-17 13:56:46,501 Stage-1 map = 100%, reduce = 0%
2016-06-17 13:56:47,871 Stage-1 map = 0%, reduce = 0%
2016-06-17 13:57:17,275 Stage-1 map = 100%, reduce = 0%

我的问题是什么原因导致该错误以及如何解决? map 的100%是什么意思?

非常感谢你。

附言数据如下:
196     242     3       881250949
186 302 3 891717742
22 377 1 878887116
244 51 2 880606923
166 346 1 886397596
298 474 4 884182806
115 265 2 881171488
253 465 5 891628467
305 451 3 886324817
....

最佳答案

我刚刚在回溯中注意到了这一点:

处理行 {"userid":222,"movieid":298,"rating":4,"unixtime":"877563253"}时的

“unixtime” 是一个字符串,根据您的表,它应该是weekday INT

关于python - 使用Python在Hive中导入数据时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37887736/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com