gpt4 book ai didi

python - MRJob 中的 mapper_pre_filter

转载 作者:太空宇宙 更新时间:2023-11-04 05:54:42 25 4
gpt4 key购买 nike

我一直在尝试修改给定的 mapper_pre_filter 示例 here .现在,如果我不直接在步骤中指定命令,而是编写一个返回该命令的方法,如下所示:

from mrjob.job import MRJob
from mrjob.protocol import JSONValueProtocol


class KittiesJob(MRJob):
OUTPUT_PROTOCOL = JSONValueProtocol

def filter_input(self):
return ''' grep 'kitty' '''

def test_for_kitty(self, _, value):
yield None, 0 # make sure we have some output
if 'kitty' in value:
yield None, 1

def sum_missing_kitties(self, _, values):
yield None, sum(values)

def steps(self):
return [
self.mr(mapper_pre_filter=self.filter_input,
mapper=self.test_for_kitty,
reducer=self.sum_missing_kitties)]

if __name__ == '__main__':
KittiesJob().run()

我收到以下异常:

Exception: error getting step information: 
Traceback (most recent call last):
File "/Users/sverma/work/mrjob/filter_input.py", line 30, in <module>
KittiesJob().run()
File "/Library/Python/2.7/site-packages/mrjob/job.py", line 494, in run
mr_job.execute()
File "/Library/Python/2.7/site-packages/mrjob/job.py", line 500, in execute
self.show_steps()
File "/Library/Python/2.7/site-packages/mrjob/job.py", line 677, in show_steps
print >> self.stdout, json.dumps(self._steps_desc())
File "/Library/Python/2.7/site-packages/simplejson/__init__.py", line 370, in dumps
return _default_encoder.encode(obj)
File "/Library/Python/2.7/site-packages/simplejson/encoder.py", line 269, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/Library/Python/2.7/site-packages/simplejson/encoder.py", line 348, in iterencode
return _iterencode(o, 0)
File "/Library/Python/2.7/site-packages/simplejson/encoder.py", line 246, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: <bound method KittiesJob.filter_input of <__main__.KittiesJob object at 0x10449ac90>> is not JSON serializable

谁能解释一下我做错了什么?

最佳答案

哇,这么晚的回答。我想你想改变这个:mapper_pre_filter=self.filter_input,mapper_pre_filter=self.filter_input(),

在示例中,mapper_pre_filter 应该是一个字符串,而不是一个函数。也许将来会对某人有所帮助。

堆栈跟踪表明过滤器的输出不是 JSON 可序列化的,因为它可能是空的。

关于python - MRJob 中的 mapper_pre_filter,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28428173/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com