gpt4 book ai didi

python - 在Hadoop集群中运行代码时Mapper.py和Reducer.py中面临的问题

转载 作者:行者123 更新时间:2023-12-02 20:23:22 25 4
gpt4 key购买 nike

运行此代码以获取Hadoop群集中的概率,将我的数据存储在CSV文件中。

当我在群集中运行此代码时,遇到此错误“java.lang.RuntimeException:PipeMapRed.waitOutputThreads():子进程失败,代码为1”,任何人都在修复我的代码。

#!/usr/bin/env python3
"""mapper.py"""
import sys

# Get input lines from stdin
for line in sys.stdin:
# Remove spaces from beginning and end of the line
line = line.strip()

# Split it into tokens
#tokens = line.split()

#Get probability_mass values
for probability_mass in line:
print(str(probability_mass)+ '\t1')
#!/usr/bin/env python3
"""reducer.py"""
import sys
from collections import defaultdict


counts = defaultdict(int)

# Get input from stdin
for line in sys.stdin:
#Remove spaces from beginning and end of the line
line = line.strip()

# skip empty lines
if not line:
continue

# parse the input from mapper.py
k,v = line.split('\t', 1)
counts[v] += 1

total = sum(counts.values())
probability_mass = {k:v/total for k,v in counts.items()}
print(probability_mass)
marks
10
10
60
10
30
Expected output Probability of each number

{10: 0.6, 60: 0.2, 30: 0.2}

but result still show like this
{1:1} {1:1} {1:1} {1:1} {1:1} {1:1}

最佳答案

真正的错误应该在YARN UI的中可用,但是将概率作为关键字将不允许您一次求和所有值,因为它们都将最终归结为不同的reducer

如果您没有用于分组值的键,则可以使用此键,它将所有数据集中到一个化简器中
print('%s\t%s' % (None, probability_mass))
这是您想要的输出的工作示例,我仅使用输入文件(而不是在Hadoop中)对其进行了测试

import sys
from collections import defaultdict

counts = defaultdict(int)

# Get input from stdin
for line in sys.stdin:
#Remove spaces from beginning and end of the line
line = line.strip()

# skip empty lines
if not line:
continue

# parse the input from mapper.py
k,v = line.split('\t', 1)
counts[v] += 1

total = float(sum(counts.values()))
probability_mass = {k:v/total for k,v in counts.items()}
print(probability_mass)

输出量
{'10': 0.6, '60': 0.2, '30': 0.2}

您可以使用cat file.txt | python mapper.py | sort -u | python reducer.py在没有Hadoop的情况下测试代码

另外,mrjob或pyspark是高级语言,可以提供更多有用的功能

关于python - 在Hadoop集群中运行代码时Mapper.py和Reducer.py中面临的问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59140021/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com