- mongodb - 在 MongoDB mapreduce 中,如何展平值对象?
- javascript - 对象传播与 Object.assign
- html - 输入类型 ="submit"Vs 按钮标签它们可以互换吗?
- sql - 使用 MongoDB 而不是 MS SQL Server 的优缺点
我在玩Natural Language Toolkit (NLTK)。
它的文档( Book 和 HOWTO )非常庞大,并且示例有时稍微高级一些。
有没有关于 NLTK 的使用/应用的基本示例?我在想 NTLK articles 之类的东西在 Stream Hacker 博客上。
最佳答案
这是我自己的实际示例,方便其他人查找此问题(请原谅示例文本,这是我在 Wikipedia 上发现的第一件事):
import nltk
import pprint
tokenizer = None
tagger = None
def init_nltk():
global tokenizer
global tagger
tokenizer = nltk.tokenize.RegexpTokenizer(r'\w+|[^\w\s]+')
tagger = nltk.UnigramTagger(nltk.corpus.brown.tagged_sents())
def tag(text):
global tokenizer
global tagger
if not tokenizer:
init_nltk()
tokenized = tokenizer.tokenize(text)
tagged = tagger.tag(tokenized)
tagged.sort(lambda x,y:cmp(x[1],y[1]))
return tagged
def main():
text = """Mr Blobby is a fictional character who featured on Noel
Edmonds' Saturday night entertainment show Noel's House Party,
which was often a ratings winner in the 1990s. Mr Blobby also
appeared on the Jamie Rose show of 1997. He was designed as an
outrageously over the top parody of a one-dimensional, mute novelty
character, which ironically made him distinctive, absurd and popular.
He was a large pink humanoid, covered with yellow spots, sporting a
permanent toothy grin and jiggling eyes. He communicated by saying
the word "blobby" in an electronically-altered voice, expressing
his moods through tone of voice and repetition.
There was a Mrs. Blobby, seen briefly in the video, and sold as a
doll.
However Mr Blobby actually started out as part of the 'Gotcha'
feature during the show's second series (originally called 'Gotcha
Oscars' until the threat of legal action from the Academy of Motion
Picture Arts and Sciences[citation needed]), in which celebrities
were caught out in a Candid Camera style prank. Celebrities such as
dancer Wayne Sleep and rugby union player Will Carling would be
enticed to take part in a fictitious children's programme based around
their profession. Mr Blobby would clumsily take part in the activity,
knocking over the set, causing mayhem and saying "blobby blobby
blobby", until finally when the prank was revealed, the Blobby
costume would be opened - revealing Noel inside. This was all the more
surprising for the "victim" as during rehearsals Blobby would be
played by an actor wearing only the arms and legs of the costume and
speaking in a normal manner.[citation needed]"""
tagged = tag(text)
l = list(set(tagged))
l.sort(lambda x,y:cmp(x[1],y[1]))
pprint.pprint(l)
if __name__ == '__main__':
main()
输出:
[('rugby', None),
('Oscars', None),
('1990s', None),
('",', None),
('Candid', None),
('"', None),
('blobby', None),
('Edmonds', None),
('Mr', None),
('outrageously', None),
('.[', None),
('toothy', None),
('Celebrities', None),
('Gotcha', None),
(']),', None),
('Jamie', None),
('humanoid', None),
('Blobby', None),
('Carling', None),
('enticed', None),
('programme', None),
('1997', None),
('s', None),
("'", "'"),
('[', '('),
('(', '('),
(']', ')'),
(',', ','),
('.', '.'),
('all', 'ABN'),
('the', 'AT'),
('an', 'AT'),
('a', 'AT'),
('be', 'BE'),
('were', 'BED'),
('was', 'BEDZ'),
('is', 'BEZ'),
('and', 'CC'),
('one', 'CD'),
('until', 'CS'),
('as', 'CS'),
('This', 'DT'),
('There', 'EX'),
('of', 'IN'),
('inside', 'IN'),
('from', 'IN'),
('around', 'IN'),
('with', 'IN'),
('through', 'IN'),
('-', 'IN'),
('on', 'IN'),
('in', 'IN'),
('by', 'IN'),
('during', 'IN'),
('over', 'IN'),
('for', 'IN'),
('distinctive', 'JJ'),
('permanent', 'JJ'),
('mute', 'JJ'),
('popular', 'JJ'),
('such', 'JJ'),
('fictional', 'JJ'),
('yellow', 'JJ'),
('pink', 'JJ'),
('fictitious', 'JJ'),
('normal', 'JJ'),
('dimensional', 'JJ'),
('legal', 'JJ'),
('large', 'JJ'),
('surprising', 'JJ'),
('absurd', 'JJ'),
('Will', 'MD'),
('would', 'MD'),
('style', 'NN'),
('threat', 'NN'),
('novelty', 'NN'),
('union', 'NN'),
('prank', 'NN'),
('winner', 'NN'),
('parody', 'NN'),
('player', 'NN'),
('actor', 'NN'),
('character', 'NN'),
('victim', 'NN'),
('costume', 'NN'),
('action', 'NN'),
('activity', 'NN'),
('dancer', 'NN'),
('grin', 'NN'),
('doll', 'NN'),
('top', 'NN'),
('mayhem', 'NN'),
('citation', 'NN'),
('part', 'NN'),
('repetition', 'NN'),
('manner', 'NN'),
('tone', 'NN'),
('Picture', 'NN'),
('entertainment', 'NN'),
('night', 'NN'),
('series', 'NN'),
('voice', 'NN'),
('Mrs', 'NN'),
('video', 'NN'),
('Motion', 'NN'),
('profession', 'NN'),
('feature', 'NN'),
('word', 'NN'),
('Academy', 'NN-TL'),
('Camera', 'NN-TL'),
('Party', 'NN-TL'),
('House', 'NN-TL'),
('eyes', 'NNS'),
('spots', 'NNS'),
('rehearsals', 'NNS'),
('ratings', 'NNS'),
('arms', 'NNS'),
('celebrities', 'NNS'),
('children', 'NNS'),
('moods', 'NNS'),
('legs', 'NNS'),
('Sciences', 'NNS-TL'),
('Arts', 'NNS-TL'),
('Wayne', 'NP'),
('Rose', 'NP'),
('Noel', 'NP'),
('Saturday', 'NR'),
('second', 'OD'),
('his', 'PP$'),
('their', 'PP$'),
('him', 'PPO'),
('He', 'PPS'),
('more', 'QL'),
('However', 'RB'),
('actually', 'RB'),
('also', 'RB'),
('clumsily', 'RB'),
('originally', 'RB'),
('only', 'RB'),
('often', 'RB'),
('ironically', 'RB'),
('briefly', 'RB'),
('finally', 'RB'),
('electronically', 'RB-HL'),
('out', 'RP'),
('to', 'TO'),
('show', 'VB'),
('Sleep', 'VB'),
('take', 'VB'),
('opened', 'VBD'),
('played', 'VBD'),
('caught', 'VBD'),
('appeared', 'VBD'),
('revealed', 'VBD'),
('started', 'VBD'),
('saying', 'VBG'),
('causing', 'VBG'),
('expressing', 'VBG'),
('knocking', 'VBG'),
('wearing', 'VBG'),
('speaking', 'VBG'),
('sporting', 'VBG'),
('revealing', 'VBG'),
('jiggling', 'VBG'),
('sold', 'VBN'),
('called', 'VBN'),
('made', 'VBN'),
('altered', 'VBN'),
('based', 'VBN'),
('designed', 'VBN'),
('covered', 'VBN'),
('communicated', 'VBN'),
('needed', 'VBN'),
('seen', 'VBN'),
('set', 'VBN'),
('featured', 'VBN'),
('which', 'WDT'),
('who', 'WPS'),
('when', 'WRB')]
关于python - NLTK 使用的实际例子,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/526469/
谁能给我一个关于如何使用函数 crypt_r() 的例子吗? 在手册页中,不清楚返回的 char * 字符串是指向函数本身内部(在堆中)分配的内存块,还是仍然指向静态内存,如 crypt()? 最佳答
在 Spectre 中paper ,有一个利用越界数组访问的示例(第 1.2 节)。代码是 if (x < array1_size) y = array2[ array1[x] * 256 ];
这是 Grammar: difference between a top down and bottom up? 的后续问题 我从这个问题中了解到: 语法本身不是自上而下或自下而上的,而是解析器 有些
在java的构造函数中声明变量合法吗?示例。 Time(){ long timeMill = System.currentTimeMillis(); int secon
我一直在仔细研究 slick grid 的示例,并且想要 ping SO 社区并查询 Excel 电子表格编辑演示的示例?就存储而言,网格仅存储整数数据,并且网格将托管在 mvc3 razor 页面内
我很难将愚蠢的菜单置于我网站页面的中心。我知道我可以将外部 div 的宽度设置为 px 值,但我怎样才能让它以响应式网站为中心?这是页面: http://103.4.17.225/~america/i
我正在寻找可在 wordpress 上使用的主题。有时,页面会在调整大小的网络浏览器上正确加载,但在移动设备上却不能,即使尺寸相同,它也会加载某种错误(通常是错位)。例如,在此页面中 ( http:/
union { unsigned char raw[8]; struct { uint8_t gz_method; uint8_t flag;
我想使用 matchShapes() 函数在查询图像中查找对象。 假设我有一本书的模型图像,我想提取它的形状,然后尝试在另一幅图像中找到这本书(它的形状)。 我在谷歌上搜索了很多,但找不到任何关于如何
我正在寻找一个使用 inotify 的简单、简洁的示例gem 来检测目录的更改。 它缺少示例。 最佳答案 examples/watcher.rb 中有一个示例.该链接指向 aredridel 的 re
我一直在努力学习编程中的递归是什么,我需要有人来确认我是否已经完全理解它是什么。 我尝试考虑的方式是通过对象之间的碰撞检测。 假设我们有一个函数。当确定发生碰撞时调用该函数,并使用对象列表调用它以确定
我正在尝试学习如何在我正在处理的项目中使用 jBullet,我已经查看了源提供的演示,但我只是无法弄清楚这些演示如何显示对象。谁有好的资源可以指点我或提供一个在屏幕上显示一个或两个对象的基本示例? 在
我想在一个简单的 x,y 图表上绘制线条,以使用 JGraphT 在 JApplet 中显示。我找到的例子不是很有帮助。有人可以给我指出一些简单的 JGraphT 示例吗? 最佳答案 这里有一个例子,
在解决几何问题时,我遇到了一种称为滑动窗口算法的方法。 真的找不到任何关于它的学习 Material /细节。 算法是关于什么的? 最佳答案 我认为它更像是一种技术而不是一种算法。这是一种可用于各种算
我正在学习同步方法,以防止 Java 中的竞争条件和不良行为。我看到了以下示例,并被告知竞争条件非常微妙: public class Messages { private String messa
我有 100 万个 pdf,如何使用 hadoop 转换为文本并将其用于分析。目标是利用 hadoop 的强大功能将 pdf 数据提取为文本。 最佳答案 我已经在 Hadoop 上处理了一个 pdf
按照目前的情况,这个问题不适合我们的问答形式。我们希望答案得到事实、引用或专业知识的支持,但这个问题可能会引发辩论、争论、投票或扩展讨论。如果您觉得这个问题可以改进并可能重新打开,visit the
我读到过,由于堆栈展开,从析构函数中抛出不是一个好主意。我不确定我是否完全理解。所以我尝试了下面的例子 struct foo { ~foo() { throw 1;
任何人都可以告诉我一个简单的(代码)示例来展示 response.encodeURL() 的用法吗?我所有的搜索(包括 google 和 stackoverflow)只提供了 encodeURL()
我受困于 haskell 类型。 {-# LANGUAGE OverloadedStrings #-} module Main ( main ) where import qualified
我是一名优秀的程序员,十分优秀!