python - 如何使用 nltk.stem.snowball 阻止 Shakespere/KJV-6ren

python - 如何使用 nltk.stem.snowball 阻止 Shakespere/KJV

转载作者：太空宇宙更新时间：2023-11-04 05:35:37

24

4

我想截取早期现代英语文本:

sb.stem("loveth")
>>> "lov"

显然，我需要做的就是a small tweak到雪球词干分析器:

And to put the endings into the English stemmer, the list

ed edly ing ingly

步骤 1b 应扩展为

ed edly ing ingly est eth

就 Snowball 脚本而言，必须添加结尾“est”“eth”以防止结尾“ing”。

太好了，所以我只需更改变量即可。也许添加一条特殊规则来处理“thee”/“thou”/“you”和“shalt”/“shall”。 NLTK documentation将变量显示为:

class nltk.stem.snowball.EnglishStemmer(ignore_stopwords=False)

Bases: nltk.stem.snowball._StandardStemmer

The English Snowball stemmer.

Variables:

__vowels – The English vowels.

__double_consonants – The English double consonants.

__li_ending – Letters that may directly appear before a word final ‘li’.

__step0_suffixes – Suffixes to be deleted in step 0 of the algorithm.

__step1a_suffixes – Suffixes to be deleted in step 1a of the algorithm.

__step1b_suffixes – Suffixes to be deleted in step 1b of the algorithm. (Here we go)

__step2_suffixes – Suffixes to be deleted in step 2 of the algorithm.

__step3_suffixes – Suffixes to be deleted in step 3 of the algorithm.

__step4_suffixes – Suffixes to be deleted in step 4 of the algorithm.

__step5_suffixes – Suffixes to be deleted in step 5 of the algorithm.

__special_words – A dictionary containing words which have to be stemmed specially. (I can stick my "thee"/"thou" and "shalt" issues here)

现在，愚蠢的问题。如何更改变量？在我到处寻找变量的地方，我不断得到“对象没有属性”...

最佳答案

尝试:

>>> from nltk.stem import snowball
>>> stemmer = snowball.EnglishStemmer()
>>> stemmer.stem('thee')
u'thee'
>>> dir(stemmer)
['_EnglishStemmer__double_consonants', '_EnglishStemmer__li_ending', '_EnglishStemmer__special_words', '_EnglishStemmer__step0_suffixes', '_EnglishStemmer__step1a_suffixes', '_EnglishStemmer__step1b_suffixes', '_EnglishStemmer__step2_suffixes', '_EnglishStemmer__step3_suffixes', '_EnglishStemmer__step4_suffixes', '_EnglishStemmer__step5_suffixes', '_EnglishStemmer__vowels', '__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__unicode__', '__weakref__', '_r1r2_standard', '_rv_standard', 'stem', 'stopwords', 'unicode_repr']
>>> stemmer._EnglishStemmer__special_words
{u'exceeds': u'exceed', u'inning': u'inning', u'exceed': u'exceed', u'exceeding': u'exceed', u'succeeds': u'succeed', u'succeeded': u'succeed', u'skis': u'ski', u'gently': u'gentl', u'singly': u'singl', u'cannings': u'canning', u'early': u'earli', u'earring': u'earring', u'bias': u'bias', u'tying': u'tie', u'exceeded': u'exceed', u'news': u'news', u'herring': u'herring', u'proceeds': u'proceed', u'succeeding': u'succeed', u'innings': u'inning', u'proceeded': u'proceed', u'proceed': u'proceed', u'dying': u'die', u'outing': u'outing', u'sky': u'sky', u'andes': u'andes', u'idly': u'idl', u'outings': u'outing', u'ugly': u'ugli', u'only': u'onli', u'proceeding': u'proceed', u'lying': u'lie', u'howe': u'howe', u'atlas': u'atlas', u'earrings': u'earring', u'cosmos': u'cosmos', u'canning': u'canning', u'succeed': u'succeed', u'herrings': u'herring', u'skies': u'sky'}
>>> stemmer._EnglishStemmer__special_words['thee'] = 'thou'
>>> stemmer.stem('thee')
'thou'

和:

>>> stemmer._EnglishStemmer__step0_suffixes
(u"'s'", u"'s", u"'")
>>> stemmer._EnglishStemmer__step1a_suffixes
(u'sses', u'ied', u'ies', u'us', u'ss', u's')
>>> stemmer._EnglishStemmer__step1b_suffixes
(u'eedly', u'ingly', u'edly', u'eed', u'ing', u'ed')
>>> stemmer._EnglishStemmer__step2_suffixes
(u'ization', u'ational', u'fulness', u'ousness', u'iveness', u'tional', u'biliti', u'lessli', u'entli', u'ation', u'alism', u'aliti', u'ousli', u'iviti', u'fulli', u'enci', u'anci', u'abli', u'izer', u'ator', u'alli', u'bli', u'ogi', u'li')
>>> stemmer._EnglishStemmer__step3_suffixes
(u'ational', u'tional', u'alize', u'icate', u'iciti', u'ative', u'ical', u'ness', u'ful')
>>> stemmer._EnglishStemmer__step4_suffixes
(u'ement', u'ance', u'ence', u'able', u'ible', u'ment', u'ant', u'ent', u'ism', u'ate', u'iti', u'ous', u'ive', u'ize', u'ion', u'al', u'er', u'ic')
>>> stemmer._EnglishStemmer__step5_suffixes
(u'e', u'l')

请注意，步骤后缀是元组并且是不可变的，因此您不能像特殊词一样附加或添加它们，您必须“复制”并转换为列表并附加到它，然后覆盖它，例如:

>>> from nltk.stem import snowball
>>> stemmer = snowball.EnglishStemmer()
>>> stemmer._EnglishStemmer__step1b_suffixes
[u'eedly', u'ingly', u'edly', u'eed', u'ing', u'ed', 'eth']
>>> step1b = stemmer._EnglishStemmer__step1b_suffixes 
>>> stemmer._EnglishStemmer__step1b_suffixes = list(step1b) + ['eth']
>>> stemmer.stem('loveth')
u'love'

关于python - 如何使用 nltk.stem.snowball 阻止 Shakespere/KJV，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/35690892/

24

4

0

文章推荐： linux - sudo -s 和 sudo 有什么区别

文章推荐： c - Qsort 用于 C 中的特定范围？

文章推荐： c - 如何在c中打印一个非常大的值

文章推荐： html - 使用 css 在图像后添加响应式水平线

jsp - 如何从JSP输出HTML <%! ... %> 阻止？
我刚开始学习JSP技术，遇到了瓶颈。如何从 JSP 声明 block ？这不起作用: ... 服务器说没有“out”。 U: 我确实知道如何使用返回字符串的方法重写代码，但是有没有办法在？
lucene - Elasticsearch 阻止
在一个字段中，我想设置一个具有自定义过滤器的自定义分析器-着眼于词干-因此，“闪存卡”和“闪存卡”的词根相同，因此返回的结果相同当我运行以下查询时，我的命中率很高，但是“闪存卡”和“闪存卡”各自返回
c# - 阻止 WM_QUIT
快速提问。我有一个通过 PInvoke 使用 native DLL 的应用程序，这个 DLL 可能会调用 PostQuitMessage()。如何避免？ (因为我的应用程序不应该关闭) 我试过 A
javascript - 阻止 $(this) 元素上的事件
一些给定的 HTML 文章，例如: Content 与一些基本的 Jquery 结合使用，例如: $(".some_
Javascript 阻止 css？
我正在构建一个灯箱相册。当第一个图像加载时，CSS 转换起作用。当加载后的每个图像都没有。任何想法为什么？加载第一张之后的照片，但没有过渡。 Image.prototype.load = functi
android - 阻止/禁用最近使用的应用程序按钮
这个问题在这里已经有了答案: Disable recent tasks button on Android 5.0 (2 个答案) 关闭 2 年前。我知道这个问题之前在这里被问过 Android
iphone - 阻止 UIAlertViewDelegate
我是 Objective-C 的新手，我只是想弄清楚我是否可以使用 block 或选择器作为 UIAlertView 的 UIAlertViewDelegate 参数 - 哪个更合适？我已经尝试了以
c - 为什么不接受()阻止？
我是 Linux (UNIX) 套接字下套接字编程的新手。我在 Internet 上找到了以下代码，用于为每个连接生成一个线程的 tcp 服务器。但是它不起作用。accept() 函数立即返回，不等待
阻止 recv() 返回少于请求字节的情况
recv()库函数手册页提到: It returns the number of bytes received. It normally returns any data available, up
typescript - 阻止 WebStorm 建议索引导入
我有一个用于其他项目的共享 ts 库。在这个库中有被同一个库的其他资源使用的资源。该库的结构分为 components/*、interfaces/*、services/* 等目录。在每个目录的根目录中
Flutter 阻止 ListView 以新行显示
我想在同一行中一个接一个地显示我的 ListView ，但 ListView 显示每个新行中的每个项目。我怎样才能防止换行显示。以便它显示为段落 ListView.builder( shr
reactjs - 阻止 `useSelector` 重新渲染组件？
我有一个包含数千行的表格。 import React from "react" import { useSelector } from "react-redux"; import { useEffec
haskell - 阻止 GHC 警告我一个特定的缺失模式
假设我通常希望收到关于代码中不完整模式的警告，但有时我知道某个函数的模式不完整，我知道这很好。是still true GHC 的警告粒度是每个模块的，并且没有办法更改有关特定功能或定义的警告？最佳
javascript - 我如何知道浏览器通知是否被 Windows 阻止
我的网络应用程序发送浏览器通知，我知道如何检查通知的浏览器权限，以及如果未授予权限，如何请求权限。但是，即使用户授予我的站点发送通知的权限，她可能仍然无法收到通知，因为它们 might be dis
xcode - 阻止 Xcode 将文本转换为超链接？
我有 Xcode 3.2.1，并且喜欢使用它，但是当我编辑文本中带有超链接的文件时(例如，带有引用的注释:# see http://example.com)Xcode 将文本变成可点击的超链接。尝试编
excel - 阻止 Excel 将日期转换为数字
关闭。这个问题不符合Stack Overflow guidelines .它目前不接受答案。我们不允许在 Stack Overflow 上提出有关通用计算硬件和软件的问题。您可以编辑问题，使其成为
php - 阻止 Controller 执行
我有一个在 MY_Controller 中运行的 acl。如果权限被拒绝，那么此刻，我只是执行 redirect('denied') - 这是一个非常基本的 Controller ，它加载一个非常基本
firefox - 阻止 Firefox 缓存本地主机？
我一直很好奇尝试从 Chrome 切换到 Firefox Quantum，但是对于 Web 开发遇到了一个我无法轻松解决的主要障碍——它正在缓存我的本地主机文件，因此当我尝试在本地主机加载各种 emb
xcode - 阻止 Xcode 记住我以前打开的项目
这真的让我很兴奋!在任何时候，我都会参与多个项目。当我退出Xcode时，下次打开Xcode时，我前一天的所有项目都会自动一一打开。经常我最终编辑错误的文件，AHHHHHHHHHHH!我可以阻止这种行
wiki - MediaWiki大量用户删除/合并/阻止
我的Wiki上有500个左右的Spambot和大约5个实际注册用户。我已经使用nuke删除了他们的页面，但是他们一直在重新发布。我已经使用reCaptcha控制了spambot的注册。现在，我只需要一

首页

博学

6Ren·AI

商城

python - 如何使用 nltk.stem.snowball 阻止 Shakespere/KJV