gpt4 book ai didi

python-2.7 - 将句子列表中的单词标记为 Python

转载 作者:行者123 更新时间:2023-12-04 00:14:09 25 4
gpt4 key购买 nike

我目前有一个文件,其中包含一个看起来像

example = ['Mary had a little lamb' , 
'Jack went up the hill' ,
'Jill followed suit' ,
'i woke up suddenly' ,
'it was a really bad dream...']

“示例”是此类句子的列表,我希望输出如下所示:
mod_example = ["'Mary' 'had' 'a' 'little' 'lamb'" , 'Jack' 'went' 'up' 'the' 'hill' ....]等等。
我需要将句子与标记化的每个单词分开,以便我可以比较 mod_example 句子中的每个单词(一次使用 for 循环)与引用句子。

我试过这个:
for sentence in example:
text3 = sentence.split()
print text3

并得到以下输出:
['it', 'was', 'a', 'really', 'bad', 'dream...']

我如何为所有句子得到这个?
它不断覆盖。是的,还要提到我的方法是否正确?
这应该仍然是一个带有标记化单词的句子列表..谢谢

最佳答案

你可以在 NLTK ( http://nltk.org/api/nltk.tokenize.html ) 中使用单词 tokenizer 和列表理解,参见 http://docs.python.org/2/tutorial/datastructures.html#list-comprehensions

>>> from nltk.tokenize import word_tokenize
>>> example = ['Mary had a little lamb' ,
... 'Jack went up the hill' ,
... 'Jill followed suit' ,
... 'i woke up suddenly' ,
... 'it was a really bad dream...']
>>> tokenized_sents = [word_tokenize(i) for i in example]
>>> for i in tokenized_sents:
... print i
...
['Mary', 'had', 'a', 'little', 'lamb']
['Jack', 'went', 'up', 'the', 'hill']
['Jill', 'followed', 'suit']
['i', 'woke', 'up', 'suddenly']
['it', 'was', 'a', 'really', 'bad', 'dream', '...']

关于python-2.7 - 将句子列表中的单词标记为 Python,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21361073/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com