gpt4 book ai didi

python - 从 IPython notebook 运行 MRJob

转载 作者:太空宇宙 更新时间:2023-11-03 11:29:31 25 4
gpt4 key购买 nike

我正在尝试从 IPython notebook 运行 mrjob 示例

from mrjob.job import MRJob


class MRWordFrequencyCount(MRJob):

def mapper(self, _, line):
yield "chars", len(line)
yield "words", len(line.split())
yield "lines", 1

def reducer(self, key, values):
yield key, sum(values)

然后用代码运行它

mr_job = MRWordFrequencyCount(args=["testfile.txt"])
with mr_job.make_runner() as runner:
runner.run()
for line in runner.stream_output():
key, value = mr_job.parse_output_line(line)
print key, value

并得到错误:

TypeError: <module '__main__' (built-in)> is a built-in class

有没有办法从 IPython notebook 运行 mrjob?

最佳答案

我还没有找到“完美的方法”,但你可以做的一件事是创建一个笔记本单元格,使用 %%file 魔法,将单元格内容写入文件:

%%file wordcount.py
from mrjob.job import MRJob

class MRWordFrequencyCount(MRJob):

def mapper(self, _, line):
yield "chars", len(line)
yield "words", len(line.split())
yield "lines", 1

def reducer(self, key, values):
yield key, sum(values)

然后让 mrjob 在后面的单元格中运行该文件:

import wordcount
reload(wordcount)

mr_job = wordcount.MRWordFrequencyCount(args=['example.txt'])
with mr_job.make_runner() as runner:
runner.run()
for line in runner.stream_output():
key, value = mr_job.parse_output_line(line)
print key, value

请注意,我调用了我的文件 wordcount.py 并从 wordcount 模块导入了类 MRWordFrequencyCount -- 文件名和模块必须匹配。 Python 还缓存导入的模块,当您更改 wordcount.py 文件时,iPython 不会重新加载模块,而是使用旧的缓存模块。这就是我将 reload() 调用放在那里的原因。

引用:https://groups.google.com/d/msg/mrjob/CfdAgcEaC-I/8XfJPXCjTvQJ

更新(更短)
对于较短的第二个笔记本单元,您可以通过从笔记本中调用 shell 来运行 mrjob

! python mrjob.py shakespeare.txt

引用:http://jupyter.cs.brynmawr.edu/hub/dblank/public/Jupyter%20Magics.ipynb

关于python - 从 IPython notebook 运行 MRJob,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24701101/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com