gpt4 book ai didi

python - 在python中尝试程序进行mapreduce并需要一些帮助

转载 作者:行者123 更新时间:2023-12-02 22:08:37 26 4
gpt4 key购买 nike

为了获得更多的动手经验,我想尝试一个项目的字数统计。
这是我的示例数据。

The United Nations (UN) is an intergovernmental organisation established on 24 October 1945 to promote international cooperation. A replacement for the ineffective League of Nations, the organisation was created following World War II to prevent another such conflict.



[...]

我用下面的python代码来得到我的结果
from mrjob.job import MRJob

from mrjob.step import MRStep



class MovieRatings(MRJob):

def steps(self):

return [

MRStep(mapper=self.mapper_get_ratings,

reducer=self.reducer_count_ratings),

]



def mapper_get_ratings(self, _, line):

(word) = line.split(' ')

yield word, 1



def reducer_count_ratings(self, key, values):

yield Key, sum(values)


if __name__ == '__main__':

MovieRatings.run()

我在使用Python 2时遇到以下错误
[root@localhost Desktop]# python RatingsBreakdown.py UN.txt
Traceback (most recent call last):
File "RatingsBreakdown.py", line 1, in <module>
from mrjob.job import MRJob
File "/usr/lib/python2.6/site-packages/mrjob/job.py", line 1106
for k, v in unfiltered_jobconf.items() if v is not None
^
SyntaxError: invalid syntax

也是Python 3
[root@localhost Desktop]# python3 RatingsBreakdown.py UN.txt
No configs found; falling back on auto-configuration
No configs specified for inline runner
Running step 1 of 2...
Creating temp directory /tmp/RatingsBreakdown.training.20171128.083536.602598
Error while reading from /tmp/RatingsBreakdown.training.20171128.083536.602598/step/000/mapper/00000/input:
Traceback (most recent call last):
File "RatingsBreakdown.py", line 25, in <module>
RatingsBreakdown.run()
File "/usr/lib/python3.4/site-packages/mrjob/job.py", line 424, in run
mr_job.execute()
File "/usr/lib/python3.4/site-packages/mrjob/job.py", line 445, in execute
super(MRJob, self).execute()
File "/usr/lib/python3.4/site-packages/mrjob/launch.py", line 185, in execute
self.run_job()
File "/usr/lib/python3.4/site-packages/mrjob/launch.py", line 233, in run_job
runner.run()
File "/usr/lib/python3.4/site-packages/mrjob/runner.py", line 511, in run
self._run()
File "/usr/lib/python3.4/site-packages/mrjob/sim.py", line 144, in _run
self._run_mappers_and_combiners(step_num, map_splits)
File "/usr/lib/python3.4/site-packages/mrjob/sim.py", line 185, in _run_mappers_and_combiners
for task_num, map_split in enumerate(map_splits)
File "/usr/lib/python3.4/site-packages/mrjob/sim.py", line 120, in _run_multiple
func()
File "/usr/lib/python3.4/site-packages/mrjob/sim.py", line 662, in _run_mapper_and_combiner
run_mapper()
File "/usr/lib/python3.4/site-packages/mrjob/sim.py", line 685, in _run_task
stdin, stdout, stderr, wd, env)
File "/usr/lib/python3.4/site-packages/mrjob/inline.py", line 92, in invoke_task
task.execute()
File "/usr/lib/python3.4/site-packages/mrjob/job.py", line 433, in execute
self.run_mapper(self.options.step_num)
File "/usr/lib/python3.4/site-packages/mrjob/job.py", line 517, in run_mapper
for out_key, out_value in mapper(key, value) or ():
File "RatingsBreakdown.py", line 13, in mapper_get_ratings
(userID, movieID, rating, timestamp) = line.split('\t')
ValueError: need more than 1 value to unpack

也和我的MovieRatings
[root@localhost Desktop]# python3 MovieRatings.py UN.txt
No configs found; falling back on auto-configuration
No configs specified for inline runner
Running step 1 of 1...
Creating temp directory /tmp/MovieRatings.training.20171128.083635.368889
Error while reading from /tmp/MovieRatings.training.20171128.083635.368889/step/000/reducer/00000/input:
Traceback (most recent call last):
File "MovieRatings.py", line 20, in <module>
MovieRatings.run()
File "/usr/lib/python3.4/site-packages/mrjob/job.py", line 424, in run
mr_job.execute()
File "/usr/lib/python3.4/site-packages/mrjob/job.py", line 445, in execute
super(MRJob, self).execute()
File "/usr/lib/python3.4/site-packages/mrjob/launch.py", line 185, in execute
self.run_job()
File "/usr/lib/python3.4/site-packages/mrjob/launch.py", line 233, in run_job
runner.run()
File "/usr/lib/python3.4/site-packages/mrjob/runner.py", line 511, in run
self._run()
File "/usr/lib/python3.4/site-packages/mrjob/sim.py", line 150, in _run
self._run_reducers(step_num, num_reducer_tasks)
File "/usr/lib/python3.4/site-packages/mrjob/sim.py", line 246, in _run_reducers
for task_num in range(num_reducer_tasks)
File "/usr/lib/python3.4/site-packages/mrjob/sim.py", line 120, in _run_multiple
func()
File "/usr/lib/python3.4/site-packages/mrjob/sim.py", line 685, in _run_task
stdin, stdout, stderr, wd, env)
File "/usr/lib/python3.4/site-packages/mrjob/inline.py", line 92, in invoke_task
task.execute()
File "/usr/lib/python3.4/site-packages/mrjob/job.py", line 439, in execute
self.run_reducer(self.options.step_num)
File "/usr/lib/python3.4/site-packages/mrjob/job.py", line 560, in run_reducer
for out_key, out_value in reducer(key, values) or ():
File "MovieRatings.py", line 17, in reducer_count_ratings
yield Key, sum(values)
NameError: name 'Key' is not defined

我想解决错误并了解ny错误是什么。

最佳答案

似乎该库仅在Python3中有效

  File "RatingsBreakdown.py", line 13, in mapper_get_ratings
(userID, movieID, rating, timestamp) = line.split('\t')
ValueError: need more than 1 value to unpack

首先,您运行 RatingsBreakdown.py...。此外,您显示的输入不包含任何选项卡,并且您尝试提取4列。不太清楚您在这里的期望。
  File "MovieRatings.py", line 17, in reducer_count_ratings
yield Key, sum(values)
NameError: name 'Key' is not defined

自我说明...您的变量是小写的 key

关于python - 在python中尝试程序进行mapreduce并需要一些帮助,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47527642/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com