gpt4 book ai didi

python-2.7 - Python如何解决错误:java.lang.RuntimeException:PipeMapRed.waitOutputThreads():子进程失败,代码为2

转载 作者:行者123 更新时间:2023-12-02 21:05:35 26 4
gpt4 key购买 nike

我在hadoop流中运行简单的python代码时遇到问题。
我尝试了以前的帖子中的所有建议,但都遇到了类似的错误,但仍然有问题。

  • 添加了usr / bin / env python
  • chmod a + x映射器和化简器python代码
  • 为-mapper“python mapper.py -n 1 -r 0.4”加上“”

    我已经在外部运行了代码,并且运行良好。

    更新:我使用以下代码在hadoop流之外运行代码:
    cat file |python mapper.py -n 5 -r 0.4 |sort|python reducer.py -f 3618 

    效果很好..但是现在我需要将其运行到HADOOP STREAMING
    hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar \
    -D mapreduce.job.reduces=5 \
    -files lr \
    -mapper "python lr/mapper.py -n 5 -r 0.4" \
    -reducer "python lr/reducer.py -f 3618" \
    -input training \
    -output models

    hadoop流是失败的。我看了一下日志,却看不到任何东西告诉我为什么会这样?

    我有以下 mapper.py :
    #!/usr/bin/env python

    import sys
    import random

    from optparse import OptionParser

    parser = OptionParser()
    parser.add_option("-n", "--model-num", action="store", dest="n_model",
    help="number of models to train", type="int")
    parser.add_option("-r", "--sample-ratio", action="store", dest="ratio",
    help="ratio to sample for each ensemble", type="float")

    options, args = parser.parse_args(sys.argv)

    random.seed(8803)
    r = options.ratio
    for line in sys.stdin:
    # TODO
    # Note: The following lines are only there to help
    # you get started (and to have a 'runnable' program).
    # You may need to change some or all of the lines below.
    # Follow the pseudocode given in the PDF.
    key = random.randint(0, options.n_model-1)
    value = line.strip()
    for i in range(1, options.n_model+1):
    m = random.random()
    if m < r:
    print "%d\t%s" % (i, value)

    和我的 reducer.py :
    #!/usr/bin/env python
    import sys
    import pickle
    from optparse import OptionParser
    from lrsgd import LogisticRegressionSGD
    from utils import parse_svm_light_line

    parser = OptionParser()
    parser.add_option("-e", "--eta", action="store", dest="eta",
    default=0.01, help="step size", type="float")
    parser.add_option("-c", "--Regularization-Constant", action="store", dest="C",
    default=0.0, help="regularization strength", type="float")
    parser.add_option("-f", "--feature-num", action="store", dest="n_feature",
    help="number of features", type="int")
    options, args = parser.parse_args(sys.argv)

    classifier = LogisticRegressionSGD(options.eta, options.C, options.n_feature)

    for line in sys.stdin:
    key, value = line.split("\t", 1)
    value = value.strip()
    X, y = parse_svm_light_line(value)
    classifier.fit(X, y)

    pickle.dump(classifier, sys.stdout)

    当我在代码外运行它时,它运行正常,但是当我在hadoop流中运行它时,出现了以下错误:
    17/02/07 07:44:34 INFO mapreduce.Job: Task Id : attempt_1486438814591_0038_m_000001_2, Status : FAILED
    Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)

  • 最佳答案

    在帖子中使用Harishanker的答案-How to resolve java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2?

    确保使用chmod可执行映射器文件和化简器文件。 (例如:“chmod 744 mapper.py”)

    然后像这样执行流命令:

    hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar \
    -D mapreduce.job.reduces=5 \
    -files lr \
    -mapper lr/mapper.py -n 5 -r 0.4 \
    -reducer lr/reducer.py -f 3618 \
    -input training \
    -output models

    现在应该可以了。如果没有,请发表评论。

    关于python-2.7 - Python如何解决错误:java.lang.RuntimeException:PipeMapRed.waitOutputThreads():子进程失败,代码为2,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42084411/

    26 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com