python - 使用 NLTK 将早期现代英语转换为 20 世纪的拼写-6ren

python - 使用 NLTK 将早期现代英语转换为 20 世纪的拼写

转载作者：太空宇宙更新时间：2023-11-03 13:52:57

25

4

我有一个字符串列表，这些字符串都是以“th”结尾的早期现代英语单词。这些包括 hath、appointeth、demandeth 等——它们都与第三人称单数结合。

作为一个更大项目的一部分(使用我的计算机将 Gargantua 和 Pantagruel 的古腾堡电子文本转换成更像 20 世纪的英语，以便我能够更轻松地阅读它)我想删除最后一个从所有这些词中提取两个或三个字符并将它们替换为“s”，然后对仍未现代化的词使用稍微修改的函数，两者都包含在下面。

我的主要问题是我从来没有设法在 Python 中正确输入。我发现这部分语言在这一点上真的很困惑。

这是删除 th 的函数:

from __future__ import division
import nltk, re, pprint

def ethrema(word):
    if word.endswith('th'):
        return word[:-2] + 's'

这是删除无关的 e 的函数:

def ethremb(word):
    if word.endswith('es'):
        return word[:-2] + 's'

因此，“abateth”和“accuseth”这两个词会通过 ethrema 但不会通过 ethrema(ethrema)，而“abhorreth”这个词需要同时通过这两个词。

如果有人能想出更有效的方法来做到这一点，我会洗耳恭听。

这是我非常业余的尝试在需要现代化的单词的标记化列表上使用这些函数的结果:

>>> eth1 = [w.ethrema() for w in text]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'ethrema'

所以，是的，这确实是一个打字问题。这些是我用 Python 编写的第一个函数，但我不知道如何将它们应用到实际对象中。

最佳答案

ethrema() 不是 str 类型的方法，您必须使用以下方法:

eth1 = [ethrema(w) for w in text]
#AND
eth2 = [ethremb(w) for w in text]

编辑(回答评论):

ethremb(ethrema(word)) 在您对函数进行一些小的更改之前不会工作:

def ethrema(word):
    if word.endswith('th'):
        return word[:-2] + 's'
    else
        return word

def ethremb(word):
    if word.endswith('es'):
        return word[:-2] + 's'
    else
        return word

#OR

def ethrema(word):
    if word.endswith('th'):
        return word[:-2] + 's'
    elif word.endswith('es'):
        return word[:-2] + 's'
    else
        return word

关于python - 使用 NLTK 将早期现代英语转换为 20 世纪的拼写，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/3591673/

25

4

0

文章推荐： android - dbhelper、ORMLite 和 fragment 问题

文章推荐： c# - WinRT 后台任务

文章推荐： c# - 用于 WinForms 插件的最佳设置方法

c++ - std::time_get - 世纪？
有什么方法可以告诉 std::time_get get_date 现在是几世纪？我们处理 1900 年之前的日期。是否有更好的 C++ 日期时间库可以做到这一点？我们有一个处理几种文化的内部解决方案，
sql-server - VARCHAR 完全像 20 世纪 90 年代吗？
就目前情况而言，这个问题不太适合我们的问答形式。我们希望答案得到事实、引用资料或专业知识的支持，但这个问题可能会引发辩论、争论、民意调查或扩展讨论。如果您觉得这个问题可以改进并可能重新开放，visit

首页

博学

6Ren·AI

商城

python - 使用 NLTK 将早期现代英语转换为 20 世纪的拼写