Python NLTK : how to lemmatize text include verb in english?-6ren

Python NLTK : how to lemmatize text include verb in english?

转载作者：太空宇宙更新时间：2023-11-03 12:48:09

26

4

我想对这段文字进行词形还原，它只是对名词进行词形还原，我还需要对动词进行词形还原

    >>> import nltk, re, string
    >>> from nltk.stem import WordNetLemmatizer
    >>> from urllib import urlopen
    >>> url="https://raw.githubusercontent.com/evandrix/nltk_data/master/corpora/europarl_raw/english/ep-00-01-17.en"
    >>> raw = urlopen(url).read()
    >>> raw ="".join(l for l in raw if l not in string.punctuation)
    >>> tokens=nltk.word_tokenize(raw)
    >>> from nltk.stem import WordNetLemmatizer
    >>> lemmatizer = WordNetLemmatizer()
    >>> lem = [lemmatizer.lemmatize(t) for t in tokens]
    >>> lem[:20]
['Resumption', 'of', 'the', 'session', 'I', 'declare', 'resumed', 'the', 'session', 'of', 'the', 'European', 'Parliament', 'adjourned', 'on', 'Friday', '17', 'December', '1999', 'and']

这里像resumed这样的动词应该是resume你能告诉我我应该怎么做才能对整个文本进行词形还原

最佳答案

在wordnetlemmatizer中使用pos参数:

>>> from nltk.stem import WordNetLemmatizer
>>> from nltk import pos_tag
>>> wnl = WordNetLemmatizer()
>>> wnl.lemmatize('resumed')
'resumed'
>>> wnl.lemmatize('resumed', pos='v')
u'resume'

完整代码如下，带有pos_tag功能:

>>> from nltk import word_tokenize, pos_tag
>>> from nltk.stem import WordNetLemmatizer
>>> wnl = WordNetLemmatizer()
>>> txt = """Resumption of the session I declare resumed the session of the European Parliament adjourned on Friday 17 December 1999 , and I would like once again to wish you a happy new year in the hope that you enjoyed a pleasant festive period ."""
>>> [wnl.lemmatize(i,j[0].lower()) if j[0].lower() in ['a','n','v'] else wnl.lemmatize(i) for i,j in pos_tag(word_tokenize(txt))]
['Resumption', 'of', 'the', 'session', 'I', 'declare', u'resume', 'the', 'session', 'of', 'the', 'European', 'Parliament', u'adjourn', 'on', 'Friday', '17', 'December', '1999', ',', 'and', 'I', 'would', 'like', 'once', 'again', 'to', 'wish', 'you', 'a', 'happy', 'new', 'year', 'in', 'the', 'hope', 'that', 'you', u'enjoy', 'a', 'pleasant', 'festive', 'period', '.']

关于Python NLTK : how to lemmatize text include verb in english?，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/24232702/

26

4

0

文章推荐： python - 消除列表中的重复元素

文章推荐： ssl - 具有本地 IP 的设备的自签名证书

文章推荐： delphi - IntraWeb 的 SSL 问题 - Delphi 2007

文章推荐： c# - 使用 WPF ClickOnce 手动检查更新

include - 包含头文件的更好方法是什么？ #include<> 后跟 #include""还是其他？
这个问题已经有答案了: 已关闭12 年前。 Possible Duplicates: what is the difference between #include and #include “fi
include - #include 指令中的宏替换
我想使用 #include 指令，其文件名作为外部定义的宏传递。例如 #include #FILE".h" 其中 FILE 将被定义为字符串 MyFile(不带引号)，结果为 #include "M
include - 生成源代码文件之间 "include"关系图
关闭。这个问题不满足Stack Overflow guidelines .它目前不接受答案。想改善这个问题吗？更新问题，使其成为 on-topic对于堆栈溢出。 7年前关闭。 Improve thi
include - grep --include 选项的用法？
我想在当前目录及其子目录下的每个 .m 文件中查找所有出现 ncread 的情况。我使用以下命令: grep -R --include="\.m" ncread . 但是该命令没有返回任何内容。 gr
C++ #include - 只有部分#include
有时我会遇到这样的情况，我发现我需要为大型第三方文件制作一个#include，这样我才能使用一个函数或一个小类，这让我感到内疚，因为我知道这已经消失了增加我的编译时间，因为当我只想要一个功能时它会编译
c++ - #include <> 和 #include ""
这个问题在这里已经有了答案: 关闭13年前. Possible Duplicate: what is the difference between #include and #include “fi
ios - "Apps that include an arm64 are required to include to include both armv7 and armv7s architecture"应用加载器错误
我正在尝试通过应用程序加载器提交应用程序。我收到这个错误。但我已经检查了build设置，所有三种架构都包含在有效架构设置中。最佳答案断开任何设备，只保留“iOS 设备”中的选项并将其存档。关于i
javascript - 当包含 ng-include 属性的元素本身使用 angularjs 中的 ng-include 属性添加时，嵌套 ng-include 不起作用
Please check this demo plunker更好地理解我的问题。在我的主页上有一个表格。每个表行后面都有一个最初隐藏的空行。单击第一行时，我使用指令在其下方的空行中注入(inject
include - mkdocs include with --8<-- not interpreted
我正在使用 mkdocs 创建 html 网页和片段扩展以将我的主文档分成小块。我有一个难以理解的错误: 在我制作的文件file1.md中: --8<-- includes/some_rep/frag
include - julia: `include` 文件的推荐方法:顶级文件中的包含列表或嵌套包含？
include的推荐方式是什么？您项目的所有文件？我见过很多使用类似结构的例子: include 的有序列表单个顶级文件(定义 Module 的文件，或应用程序中的“主”文件)中的语句。这似乎也是
include - JavaFX 场景生成器和 fx :include
我想知道如何使用 fx:include与 JavaFX Scene Builder 结合使用，因此: 想象我有一个 BorderPane (文件 borderpane.fxml)。在中间部分我想放一个
include - Fortran 'call' 与 'include'
我看到 Fortran 有“调用”和“包含”语句。两者有什么区别？ .i 文件类型有什么意义吗？即: include 'somefile.i' call 'somesubroutine.f' 谢谢!
c++ - #include "<> "和#include < ""> 是有效的文件包含吗？
这很挑剔，可能没有任何实际用途。我只是好奇... 在 C++20 工作草案 (n4861) 中， header 名称定义为: (5.8) header-name: " q-char-
c - #include<...> 和 #include"..."之间的区别？
这个问题已经有答案了: 已关闭10 年前。 Possible Duplicate: What is the difference between #include and #include “fil
java - $Include #include 等价于 Java
我有一个非常庞大且臃肿的类，我想将它拆分成单独的文件，但它应该对用户完全透明并且与使用该类的现有项目兼容。特别是，我有自己的 ImageMatrix 类，它定义了大量的一元函数、大量带有标量的二元函
c++ - 将#include <> 替换为#include ""
我是 grep 的新手，在重构 C 和 C++ 文件的过程中，我遇到了替换系统的问题，包括 #include <>与本地包括 #include "" . 有没有一种方法可以将 grep 与任何替代工具
java -
我正在制作一个 Spring MVC web 项目，我必须有一个常量 header 。我的基本要求是“我们希望在所有屏幕上都有一个标题，以显示谁登录了 ProjectA。” 我从这里“What is

c++ - "%include"和 "#include"之间的区别
在 SWIG 中，“%include”指令与标准 C“#include”有什么区别？例如，在所有教程中，为什么它们通常看起来像这样: %module my_module %{ #include "M
c++ - 显式直接#include 与非契约传递#include
假设我们有这个头文件: MyClass.hpp #pragma once #include class MyClass { public: MyClass(double); /* .
swift - 突然找不到 header 中的 header :1:9: note: in file included from :1: Xcode
我已经在一个项目上工作了一段时间，该项目实现了一个使用 C 库的自定义框架。该框架是用 Swift 编写的，我创建了一个模块来向 Swift 公开 C 头文件。该框架是在不同的项目中启动的，然后将该框

首页

博学

6Ren·AI

商城

Python NLTK : how to lemmatize text include verb in english?