gpt4 book ai didi

hadoop - Hadoop流随着python下降

转载 作者:行者123 更新时间:2023-12-02 20:15:12 26 4
gpt4 key购买 nike

我正在尝试使用Python脚本在Hadoop Streaming上运行Map-Reduce作业,并且在使用jupyter终端时可以正常工作。
但是当我运行以下命令时

./bin/hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.7.7.jar -file /usr/local/hadoop/python/assignmap1.py /usr/local/hadoop/python/assignreduce1.py -mapper "python assignmap1.py" -reducer "python assignreduce1.py" -input input1/data.txt -output output
出现此错误图并减少均为0%
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
这是我的 map 代码
import sys
import pandas as pd

def map():
dataset = pd.read_table(sys.stdin)
for a in range(len(dataset)):
key = dataset.loc[a][0]
date = dataset.loc[a][1]
value = dataset.loc[a][3]

#return(mapped)
print(" ".join([key,date,value]))




if __name__ == "__main__":
map()
这是我的代码
导入系统
将 Pandas 作为pd导入
def reduce():
temperature = {"City":["Date","Temp",""]}
for data in sys.stdin:
splited = data.split(" ")
key = splited[0].strip()
date = splited[1].strip()
tem = splited[2].strip()
temp = int(tem[0:2])
abstemp = abs(25 - temp)

if temperature.get(key) == None:
temperature.update({key:[date,temp,abstemp]})
else:
temp_last = temperature.get(key)[2]
if temp_last > abstemp:
temperature.update({key:[date,temp,abstemp]})

for key in temperature:
print(" ".join([key,str(temperature.get(key)[0]),str(temperature.get(key)[1])]))


if __name__ == "__main__":
reduce()
我不知道这是什么问题,并且hadoop配置应该是正确的,因为我使用的docker应该设置正确。

最佳答案

她可能有很多事情可能是错的,首先尝试在reducer和mapper中将下面的python路径作为第一行

#!/usr/bin/env python
例如,在mapper中
#!/usr/bin/env python
import sys
import pandas as pd

def map():
dataset = pd.read_table(sys.stdin)
for a in range(len(dataset)):
key = dataset.loc[a][0]
date = dataset.loc[a][1]
value = dataset.loc[a][3]

#return(mapped)
print(" ".join([key,date,value]))




if __name__ == "__main__":
map()
接下来,您只需要给映射器和reducer命名,如下所示
-mapper  assignmap1.py -reducer  assignreduce1.py

关于hadoop - Hadoop流随着python下降,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64339654/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com