gpt4 book ai didi

python - 查找句子字符串中单词的不同实现 - Python

转载 作者:行者123 更新时间:2023-11-30 23:39:47 25 4
gpt4 key购买 nike

(这个问题与一般的字符串检查有关,而不是自然语言处理本身,但如果您将其视为 NLP 问题,请想象它不是当前分析器可以分析的语言,为了简单起见,我将使用英文字符串作为示例)

假设一个单词只有 6 种可能的形式

  1. 首字母大写
  2. 其复数形式带有“s”
  3. 其复数形式带有“es”
  4. 大写+“es”
  5. 大写+“s”
  6. 不带复数或大写的基本形式

假设我想找到句子中出现的任何形式的单词coach的第一个实例的索引,是否有更简单的方法来执行这两种方法:

长 if 条件

sentence = "this is a sentence with the Coaches"
target = "coach"

print target.capitalize()

for j, i in enumerate(sentence.split(" ")):
if i == target.capitalize() or i == target.capitalize()+"es" or \
i == target.capitalize()+"s" or i == target+"es" or i==target+"s" or \
i == target:
print j

迭代 try- except

variations = [target, target+"es", target+"s", target.capitalize()+"es",
target.capitalize()+"s", target.capitalize()]

ind = 0
for i in variations:
try:
j == sentence.split(" ").index(i)
print j
except ValueError:
continue

最佳答案

我建议看看NLTK的stem包:http://nltk.org/api/nltk.stem.html

使用它,您可以“从单词中删除形态词缀,只留下词干。词干算法旨在删除语法角色、时态、派生形态等所需的词缀,只留下词干。”

如果当前 NLTK 未涵盖您的语言,您应该考虑扩展 NLTK。如果您确实需要一些简单的东西并且不关心 NLTK,那么您仍然应该将代码编写为小型的、易于组合的实用函数的集合,例如:

import string 

def variation(stem, word):
return word.lower() in [stem, stem + 'es', stem + 's']

def variations(sentence, stem):
sentence = cleanPunctuation(sentence).split()
return ( (i, w) for i, w in enumerate(sentence) if variation(stem, w) )

def cleanPunctuation(sentence):
exclude = set(string.punctuation)
return ''.join(ch for ch in sentence if ch not in exclude)

def firstVariation(sentence, stem):
for i, w in variations(sentence, stem):
return i, w

sentence = "First coach, here another two coaches. Coaches are nice."

print firstVariation(sentence, 'coach')

# print all variations/forms of 'coach' found in the sentence:
print "\n".join([str(i) + ' ' + w for i,w in variations(sentence, 'coach')])

关于python - 查找句子字符串中单词的不同实现 - Python,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13237533/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com