gpt4 book ai didi

python - 删除python中的单词扩展

转载 作者:行者123 更新时间:2023-11-28 21:19:54 24 4
gpt4 key购买 nike

我收到了一条包含多个单词的文本。我想删除所有单词的派生扩展。例如,我想删除扩展名 -ed -ing 并保留初始动词。如果我有验证或验证以保持验证 f.e.我在 python 中找到了 strip 方法,它从字符串的开头或结尾删除了一个特定的字符串,但这并不是我想要的。例如,是否有任何库在 python 中做这样的事情?

我已经尝试执行建议的帖子中的代码,但我注意到有几个词出现了奇怪的修剪。例如我有以下文字

 We goin all the way βπƒβ΅οΈβ΅οΈ        
Think ive caught on to a really good song ! Im writing π
Lookin back on the stuff i did when i was lil makes me laughh π‚
I sneezed on the beat and the beat got sicka
#nashnewvideo http://t.co/10cbUQswHR
Homee βοΈβοΈβοΈπ΄
So much respect for this man , truly amazing guy βοΈ @edsheeran
http://t.co/DGxvXpo1OM"
What a day ..
RT @edsheeran: Having some food with @ShawnMendes
#VoiceSave christina π
Im gunna make the βοΈ sign my signature pose
You all are so beautiful .. π soooo beautiful
Thought that was a really awesome quote
Beautiful things don't ask for attention"""

并且在使用下面的代码之后(我也删除了非拉丁字符和 url)

 we  goin  all  the  way 
think ive caught on to a realli good song im write
lookin back on the stuff i did when i wa lil make me laughh
i sneez on the beat and the beat got sicka
nashnewvideo
home
so much respect for thi man truli amaz guy
what a day
rt have some food with
voicesav christina
im gunna make the sign my signatur pose
you all are so beauti soooo beauti
thought that wa a realli awesom quot
beauti thing dont ask for attent

例如,它修剪美丽到美丽,引用真正到真实。我的代码如下:

 reader = csv.reader(f)
print doc
for row in reader:
text = re.sub(r"(?:\@|https?\://)\S+", "", row[2])
filter(lambda x: x in string.printable, text)
out = text.translate(string.maketrans("",""), string.punctuation)
out = re.sub("[\W\d]", " ", out.strip())
word_list = out.split()
str1 = ""
for verb in word_list:
verb = verb.lower()
verb = nltk.stem.porter.PorterStemmer().stem_word(verb)
str1 = str1+" "+verb+" "
list.append(str1)
str1 = "\n"

最佳答案

您可以使用 lemmatizer 而不是 stemmer。这是一个使用 python NLTK 的示例:

from nltk.stem import WordNetLemmatizer

s = """
You all are so beautiful soooo beautiful
Thought that was a really awesome quote
Beautiful things don't ask for attention
"""

wnl = WordNetLemmatizer()
print " ".join([wnl.lemmatize(i) for i in s.split()]) #You all are so beautiful soooo beautiful Thought that wa a really awesome quote Beautiful thing don't ask for attention

在某些情况下,它可能不会如您所愿:

print wnl.lemmatize('going') #going

然后您可以结合这两种方法:词干提取词形还原

关于python - 删除python中的单词扩展,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23732057/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com