gpt4 book ai didi

python - NLTK 使用的实际例子

转载 作者:IT老高 更新时间:2023-10-28 21:33:17 26 4
gpt4 key购买 nike

我在玩Natural Language Toolkit (NLTK)。

它的文档( BookHOWTO )非常庞大,并且示例有时稍微高级一些。

有没有关于 NLTK 的使用/应用的基本示例?我在想 NTLK articles 之类的东西在 Stream Hacker 博客上。

最佳答案

这是我自己的实际示例,方便其他人查找此问题(请原谅示例文本,这是我在 Wikipedia 上发现的第一件事):

import nltk
import pprint

tokenizer = None
tagger = None

def init_nltk():
global tokenizer
global tagger
tokenizer = nltk.tokenize.RegexpTokenizer(r'\w+|[^\w\s]+')
tagger = nltk.UnigramTagger(nltk.corpus.brown.tagged_sents())

def tag(text):
global tokenizer
global tagger
if not tokenizer:
init_nltk()
tokenized = tokenizer.tokenize(text)
tagged = tagger.tag(tokenized)
tagged.sort(lambda x,y:cmp(x[1],y[1]))
return tagged

def main():
text = """Mr Blobby is a fictional character who featured on Noel
Edmonds' Saturday night entertainment show Noel's House Party,
which was often a ratings winner in the 1990s. Mr Blobby also
appeared on the Jamie Rose show of 1997. He was designed as an
outrageously over the top parody of a one-dimensional, mute novelty
character, which ironically made him distinctive, absurd and popular.
He was a large pink humanoid, covered with yellow spots, sporting a
permanent toothy grin and jiggling eyes. He communicated by saying
the word "blobby" in an electronically-altered voice, expressing
his moods through tone of voice and repetition.

There was a Mrs. Blobby, seen briefly in the video, and sold as a
doll.

However Mr Blobby actually started out as part of the 'Gotcha'
feature during the show's second series (originally called 'Gotcha
Oscars' until the threat of legal action from the Academy of Motion
Picture Arts and Sciences[citation needed]), in which celebrities
were caught out in a Candid Camera style prank. Celebrities such as
dancer Wayne Sleep and rugby union player Will Carling would be
enticed to take part in a fictitious children's programme based around
their profession. Mr Blobby would clumsily take part in the activity,
knocking over the set, causing mayhem and saying "blobby blobby
blobby", until finally when the prank was revealed, the Blobby
costume would be opened - revealing Noel inside. This was all the more
surprising for the "victim" as during rehearsals Blobby would be
played by an actor wearing only the arms and legs of the costume and
speaking in a normal manner.[citation needed]"""
tagged = tag(text)
l = list(set(tagged))
l.sort(lambda x,y:cmp(x[1],y[1]))
pprint.pprint(l)

if __name__ == '__main__':
main()

输出:

[('rugby', None),
('Oscars', None),
('1990s', None),
('",', None),
('Candid', None),
('"', None),
('blobby', None),
('Edmonds', None),
('Mr', None),
('outrageously', None),
('.[', None),
('toothy', None),
('Celebrities', None),
('Gotcha', None),
(']),', None),
('Jamie', None),
('humanoid', None),
('Blobby', None),
('Carling', None),
('enticed', None),
('programme', None),
('1997', None),
('s', None),
("'", "'"),
('[', '('),
('(', '('),
(']', ')'),
(',', ','),
('.', '.'),
('all', 'ABN'),
('the', 'AT'),
('an', 'AT'),
('a', 'AT'),
('be', 'BE'),
('were', 'BED'),
('was', 'BEDZ'),
('is', 'BEZ'),
('and', 'CC'),
('one', 'CD'),
('until', 'CS'),
('as', 'CS'),
('This', 'DT'),
('There', 'EX'),
('of', 'IN'),
('inside', 'IN'),
('from', 'IN'),
('around', 'IN'),
('with', 'IN'),
('through', 'IN'),
('-', 'IN'),
('on', 'IN'),
('in', 'IN'),
('by', 'IN'),
('during', 'IN'),
('over', 'IN'),
('for', 'IN'),
('distinctive', 'JJ'),
('permanent', 'JJ'),
('mute', 'JJ'),
('popular', 'JJ'),
('such', 'JJ'),
('fictional', 'JJ'),
('yellow', 'JJ'),
('pink', 'JJ'),
('fictitious', 'JJ'),
('normal', 'JJ'),
('dimensional', 'JJ'),
('legal', 'JJ'),
('large', 'JJ'),
('surprising', 'JJ'),
('absurd', 'JJ'),
('Will', 'MD'),
('would', 'MD'),
('style', 'NN'),
('threat', 'NN'),
('novelty', 'NN'),
('union', 'NN'),
('prank', 'NN'),
('winner', 'NN'),
('parody', 'NN'),
('player', 'NN'),
('actor', 'NN'),
('character', 'NN'),
('victim', 'NN'),
('costume', 'NN'),
('action', 'NN'),
('activity', 'NN'),
('dancer', 'NN'),
('grin', 'NN'),
('doll', 'NN'),
('top', 'NN'),
('mayhem', 'NN'),
('citation', 'NN'),
('part', 'NN'),
('repetition', 'NN'),
('manner', 'NN'),
('tone', 'NN'),
('Picture', 'NN'),
('entertainment', 'NN'),
('night', 'NN'),
('series', 'NN'),
('voice', 'NN'),
('Mrs', 'NN'),
('video', 'NN'),
('Motion', 'NN'),
('profession', 'NN'),
('feature', 'NN'),
('word', 'NN'),
('Academy', 'NN-TL'),
('Camera', 'NN-TL'),
('Party', 'NN-TL'),
('House', 'NN-TL'),
('eyes', 'NNS'),
('spots', 'NNS'),
('rehearsals', 'NNS'),
('ratings', 'NNS'),
('arms', 'NNS'),
('celebrities', 'NNS'),
('children', 'NNS'),
('moods', 'NNS'),
('legs', 'NNS'),
('Sciences', 'NNS-TL'),
('Arts', 'NNS-TL'),
('Wayne', 'NP'),
('Rose', 'NP'),
('Noel', 'NP'),
('Saturday', 'NR'),
('second', 'OD'),
('his', 'PP$'),
('their', 'PP$'),
('him', 'PPO'),
('He', 'PPS'),
('more', 'QL'),
('However', 'RB'),
('actually', 'RB'),
('also', 'RB'),
('clumsily', 'RB'),
('originally', 'RB'),
('only', 'RB'),
('often', 'RB'),
('ironically', 'RB'),
('briefly', 'RB'),
('finally', 'RB'),
('electronically', 'RB-HL'),
('out', 'RP'),
('to', 'TO'),
('show', 'VB'),
('Sleep', 'VB'),
('take', 'VB'),
('opened', 'VBD'),
('played', 'VBD'),
('caught', 'VBD'),
('appeared', 'VBD'),
('revealed', 'VBD'),
('started', 'VBD'),
('saying', 'VBG'),
('causing', 'VBG'),
('expressing', 'VBG'),
('knocking', 'VBG'),
('wearing', 'VBG'),
('speaking', 'VBG'),
('sporting', 'VBG'),
('revealing', 'VBG'),
('jiggling', 'VBG'),
('sold', 'VBN'),
('called', 'VBN'),
('made', 'VBN'),
('altered', 'VBN'),
('based', 'VBN'),
('designed', 'VBN'),
('covered', 'VBN'),
('communicated', 'VBN'),
('needed', 'VBN'),
('seen', 'VBN'),
('set', 'VBN'),
('featured', 'VBN'),
('which', 'WDT'),
('who', 'WPS'),
('when', 'WRB')]

关于python - NLTK 使用的实际例子,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/526469/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com