gpt4 book ai didi

Python - NLTK 语料库中 tagged_sents 和 tagged_words 之间的区别

转载 作者:行者123 更新时间:2023-12-01 02:55:51 24 4
gpt4 key购买 nike

nltk tagged_sents 和 tagged_words 之间有什么区别?

它们似乎都包含元组(单词,标签)列表。如果你执行 type(),它们都是

nltk.collections.LazySubsequence

最佳答案

来自docs :

Corpus reader functions are named based on the type of information they return.  
Some common examples, and their return types, are:
- words(): list of str
- sents(): list of (list of str)
- paras(): list of (list of (list of str))
- tagged_words(): list of (str,str) tuple
- tagged_sents(): list of (list of (str,str))
- tagged_paras(): list of (list of (list of (str,str)))
- chunked_sents(): list of (Tree w/ (str,str) leaves)
- parsed_sents(): list of (Tree with str leaves)
- parsed_paras(): list of (list of (Tree with str leaves))
- xml(): A single xml ElementTree
- raw(): unprocessed corpus contents


>>> from nltk.corpus import brown

>>> brown.tagged_words()
[(u'The', u'AT'), (u'Fulton', u'NP-TL'), ...]

>>> len(brown.tagged_words()) # no. of words in the corpus.
1161192


>>> len(brown.tagged_sents()) # no. of sentence in the corpus.
57340

# Loop through the sentences and counts the words per sentence.
>>> sum(len(sent) for sent in brown.tagged_sents()) # no. of words in the corpus.
1161192

关于Python - NLTK 语料库中 tagged_sents 和 tagged_words 之间的区别,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44229467/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com