gpt4 book ai didi

python - 如何将 POS 列表更改为普通字符串

转载 作者:行者123 更新时间:2023-12-01 00:45:07 25 4
gpt4 key购买 nike

我正在编写一个程序来计算单词单次出现的次数,但首先我需要从文本中消除某些元素。我已经设法将文本小写,更改否定缩写(n't -> not)并删除所有格结尾(Tom's -> Tom)。现在最终的输出是标记的文件。

import nltk 
import re
from nltk import pos_tag
from nltk.tokenize import word_tokenize
from string import punctuation

txt = "I don't like it. She didn't like it at all. I went to Susie's. She is playing."

y=txt.lower()#I lowercase the text
word_tokens = word_tokenize(y)

def decontracted(phrase):#how to change negative contractions
phrase = re.sub(r"n\'t", " not", phrase)
return phrase

d=(decontracted(y))
print(d)

x=pos_tag(word_tokenize(d))#POS tagging
y=[s for s in x if s[1] != 'POS']#I delete POS possessive ending
print(y)

当我打印(y)时,结果是:

[('i', 'NNS'), ('do', 'VBP'), ('not', 'RB'), ('like', 'IN'), ('it', 'PRP'), ('.', '.'), ('she', 'PRP'), ('did', 'VBD'), ('not', 'RB'), ('like', 'IN'), ('it', 'PRP'), ('at', 'IN'), ('all', 'DT'), ('.','.'), ('i', 'VB'), ('went', 'VBD'), ('to', 'TO'), ('susie', 'VB'),('.', '.'), ('she', 'PRP'), ('is', 'VBZ'), ('playing', 'VBG'), ('.', '.')]

如何将其更改为以下输出?

['i', 'do', 'not', 'like', 'it', '.', 'she', 'did', 'not', 'like','it', 'at', 'all', '.', 'i', 'went', 'to', 'susie', '.', 'she', 'is', 'playing', '.']

如何将其更改为以下输出?

[i do not like it. she did not like it at all. i went to susie. she is playing.]

提前谢谢

最佳答案

这是一种方法。

y = [('i', 'NNS'), ('do', 'VBP'), ('not', 'RB'), ('like', 'IN'), ('it', 'PRP'), ('.', '.'), ('she', 'PRP'), ('did', 'VBD'), ('not', 'RB'), ('like', 'IN'), ('it', 'PRP'), ('at', 'IN'), ('all', 'DT'), ('.','.'), ('i', 'VB'), ('went', 'VBD'), ('to', 'TO'), ('susie', 'VB'),('.', '.'), ('she', 'PRP'), ('is', 'VBZ'), ('playing', 'VBG'), ('.', '.')]

w = [r[0] for r in y]
print(w)
# ['i', 'do', 'not', 'like', 'it', '.', 'she', 'did', 'not', 'like', 'it', 'at', 'all', '.', 'i', 'went', 'to', 'susie', '.', 'she', 'is', 'playing', '.']

wStr = " ".join(w)
print(wStr)
# i do not like it . she did not like it at all . i went to susie . she is playing .

string = wStr.replace(' .', '.')
print(string)

# i do not like it. she did not like it at all. i went to susie. she is playing.

关于python - 如何将 POS 列表更改为普通字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57035373/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com