gpt4 book ai didi

python - NLTK研究课题

转载 作者:行者123 更新时间:2023-12-01 08:36:52 28 4
gpt4 key购买 nike

我正在尝试标记一个句子,然后删除标点符号。

from nltk import word_tokenize
from nltk import re
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
sentence = "what's good people boy's"


tokens = word_tokenize(sentence)
tokens_nopunct = [word.lower() for word in tokens if re.search("\w",word)]
tokens_lemma = [lemmatizer.lemmatize(token) for token in tokens]

print(tokens_lemma)

这给出了输出:

['what', "'s", 'good', 'people', 'boy', "'s"]

但我希望它实现输出:['what', 'good', 'people' , 'boy']

我一直在查看 nltk 和文档,它说 re.search 是删除标点符号的方法,但它不起作用,我的代码中还有其他写错的地方吗?

最佳答案

这将有助于删除所有带标点符号的元素(不仅仅是):

import string

punc = set(string.punctuation)
a = ['what', "'s", 'good', 'people', 'boy', "'s"]
without_punc = list(filter(lambda x: x[0] not in punc, a))
print(without_punc) //['what', 'good', 'people', 'boy']

关于python - NLTK研究课题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53674743/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com