gpt4 book ai didi

python - Porter Stemmer 算法没有返回预期的输出?修改成def时

转载 作者:塔克拉玛干 更新时间:2023-11-03 05:09:56 24 4
gpt4 key购买 nike

我正在使用 PorterStemmer Python Port

The Porter stemming algorithm (or ‘Porter stemmer’) is a process for removing the commoner morphological and inflexional endings from words in English. Its main use is as part of a term normalisation process that is usually done when setting up Information Retrieval systems.

对于以下..

The other thing you need to do is reduce each word to its stem. For example, the words sing, sings, singing all have the same stem, which is sing. There is a reasonably accepted way to do this, which is called Porter's algorithm. You can download something that performs it from http://tartarus.org/martin/PorterStemmer/.

我修改了代码..

if __name__ == '__main__':
p = PorterStemmer()
if len(sys.argv) > 1:
for f in sys.argv[1:]:
infile = open(f, 'r')
while 1:
output = ''
word = ''
line = infile.readline()
if line == '':
break
for c in line:
if c.isalpha():
word += c.lower()
else:
if word:
output += p.stem(word, 0,len(word)-1)
word = ''
output += c.lower()
print output,
infile.close()

输入而不是文件中读取经过预处理的字符串并返回输出。

def algorithm(input):
p = PorterStemmer()
while 1:
output = ''
word = ''
if input == '':
break
for c in input:
if c.isalpha():
word += c.lower()
else:
if word:
output += p.stem(word, 0,len(word)-1)
word = ''
output += c.lower()
return output

请注意,如果我将 return output 放在与 while 1: 相同的缩进上,它会变成一个无限循环

用法(示例)

import PorterStemmer as ps
ps.algorithm("Michael is Singing");

输出

Michael is

预期输出

Michael is Sing

我做错了什么?

最佳答案

所以看起来罪魁祸首是它目前没有将输入的最后部分写入 output(例如,尝试“Michael is Singing stuff”——它应该正确地写入所有内容并且省略“东西”)。可能有一种更优雅的方法来处理这个问题,但您可以尝试将 else 子句添加到 for 循环中。由于问题是 final word 没有包含在 output 中,我们可以使用 else 来确保在 完成时添加 final word for 循环:

def algorithm(input):
print input
p = PorterStemmer()
while 1:
output = ''
word = ''
if input == '':
break
for c in input:
if c.isalpha():
word += c.lower()
elif word:
output += p.stem(word, 0,len(word)-1)
word = ''
output += c.lower()
else:
output += p.stem(word, 0, len(word)-1)
print output
return output

这已经通过两个测试用例进行了广泛的测试,所以很明显它是防弹的:)可能有一些边缘案例在那里爬行,但希望它能帮助你开始。

关于python - Porter Stemmer 算法没有返回预期的输出?修改成def时,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12683932/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com