gpt4 book ai didi

python - 如何在Python中实现mapreduce对模式

转载 作者:太空宇宙 更新时间:2023-11-03 14:23:43 25 4
gpt4 key购买 nike

我正在尝试尝试Python中的mapreduce对模式。需要检查一个单词是否在文本文件中,然后找到它旁边的单词并生成一对两个单词。继续遇到:

neighbors = words[words.index(w) + 1]
ValueError: substring not found

 ValueError: ("the") is not in list

文件cwork_Trials.py

from mrjob.job import MRJob

class MRCountest(MRJob):
# Word count
def mapper(self, _, document):
# Assume document is a list of words.
#words = []
words = document.strip()

w = "the"
neighbors = words.index(w)
for word in words:
#searchword = "the"
#wor.append(str(word))
#neighbors = words[words.index(w) + 1]
yield(w,1)

def reducer(self, w, values):
yield(w,sum(values))

if __name__ == '__main__':
MRCountest.run()

编辑:尝试使用配对模式在文档中搜索特定单词的每个实例,然后每次找到它旁边的单词。然后为每个实例生成一对结果,即查找“the”及其旁边的单词的实例,即 [the]、[book]、[the]、[cat] 等。

from mrjob.job import MRJob

class MRCountest(MRJob):
# Word count
def mapper(self, _, document):
# Assume document is a list of words.
#words = []
words = document.split(" ")

want = "the"
for w, want in enumerate(words, 1):
if (w+1) < len(words):
neighbors = words[w + 1]
pair = (want, neighbors)
for u in neighbors:
if want is "the":
#pair = (want, neighbors)
yield(pair),1
#neighbors = words.index(w)
#for word in words:

#searchword = "the"
#wor.append(str(word))
#neighbors = words[words.index(w) + 1]
#yield(w,1)

#def reducer(self, w, values):
#yield(w,sum(values))

if __name__ == '__main__':
MRCountest.run()

就目前情况而言,我得到了每个单词对与多个相同配对的产量。

This image shows the pseudo code I'm trying to implement

最佳答案

当您使用 words.index("the") 时,您只会获得列表或字符串中“the”的第一个实例,正如您所发现的,您将收到错误如果“the”不存在。

您还提到您正在尝试生成对,但只生成一个单词。

我认为你想做的事情更像是这样的:

def get_word_pairs(words):
for i, word in enumerate(words):
if (i+1) < len(words):
yield (word, words[i + 1]), 1
if (i-1) > 0:
yield (word, words[i - 1]), 1

假设您对两个方向的邻居都感兴趣。 (如果没有,您只需要第一个 yield 。)

最后,由于您使用了 document.strip(),我怀疑 document 实际上是一个字符串而不是一个列表。如果是这种情况,您可以使用 words = document.split("") 来获取单词列表(假设您没有任何标点符号)。

关于python - 如何在Python中实现mapreduce对模式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47772826/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com