python - TfidfVectorizer : ValueError: not a built-in stop list: russian-6ren

python - TfidfVectorizer : ValueError: not a built-in stop list: russian

转载作者：太空宇宙更新时间：2023-11-04 07:34:17

25

4

我尝试将 TfidfVectorizer 与俄语停用词一起应用

Tfidf = sklearn.feature_extraction.text.TfidfVectorizer(stop_words='russian' )
Z = Tfidf.fit_transform(X)

我明白了

ValueError: not a built-in stop list: russian

当我使用正确的英语停用词时

Tfidf = sklearn.feature_extraction.text.TfidfVectorizer(stop_words='english' )
Z = Tfidf.fit_transform(X)

如何改进？完整追溯

<ipython-input-118-e787bf15d612> in <module>()
      1 Tfidf = sklearn.feature_extraction.text.TfidfVectorizer(stop_words='russian' )
----> 2 Z = Tfidf.fit_transform(X)

C:\Program Files\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in fit_transform(self, raw_documents, y)
   1303             Tf-idf-weighted document-term matrix.
   1304         """
-> 1305         X = super(TfidfVectorizer, self).fit_transform(raw_documents)
   1306         self._tfidf.fit(X)
   1307         # X is already a transformed view of raw_documents so

C:\Program Files\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in fit_transform(self, raw_documents, y)
    815 
    816         vocabulary, X = self._count_vocab(raw_documents,
--> 817                                           self.fixed_vocabulary_)
    818 
    819         if self.binary:

C:\Program Files\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in _count_vocab(self, raw_documents, fixed_vocab)
    745             vocabulary.default_factory = vocabulary.__len__
    746 
--> 747         analyze = self.build_analyzer()
    748         j_indices = _make_int_array()
    749         indptr = _make_int_array()

C:\Program Files\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in build_analyzer(self)
    232 
    233         elif self.analyzer == 'word':
--> 234             stop_words = self.get_stop_words()
    235             tokenize = self.build_tokenizer()
    236 

C:\Program Files\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in get_stop_words(self)
    215     def get_stop_words(self):
    216         """Build or fetch the effective stop words list"""
--> 217         return _check_stop_list(self.stop_words)
    218 
    219     def build_analyzer(self):

C:\Program Files\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in _check_stop_list(stop)
     88         return ENGLISH_STOP_WORDS
     89     elif isinstance(stop, six.string_types):
---> 90         raise ValueError("not a built-in stop list: %s" % stop)
     91     elif stop is None:
     92         return None

ValueError: not a built-in stop list: russian

最佳答案

你们能读懂documentation吗？先发帖？

stop_words : string {‘english’}, list, or None (default)

If a string, it is passed to _check_stop_list and the appropriate stop list is returned. ‘english’ is currently the only supported string value.

If a list, that list is assumed to contain stop words, all of which will be removed from the resulting tokens. Only applies if analyzer == 'word'.

If None, no stop words will be used. max_df can be set to a value in the range [0.7, 1.0) to automatically detect and filter stop words based on intra corpus document frequency of terms.

关于python - TfidfVectorizer : ValueError: not a built-in stop list: russian，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39945693/

25

4

0

文章推荐： python - 迭代数据框

文章推荐： python - 如何以随机顺序运行函数？

文章推荐： html - 与选择重叠的 Div 会阻止它打开

fortran - 在 Fortran 中，stop 内在函数是否预期在标准输出中打印 'STOP'？
我经常使用stop Fortran 中固有的因各种原因停止执行(主要是在测试失败后)。 program test1 stop end program 除了停止程序执行之外什么都不做。 prog
c - 即使使用命令 if(*str == 'stop' ) 并输入 stop，“While”循环也不会停止
我想编写一个函数，用字符 e 替换所有出现的字符 c。这些功能似乎正在发挥作用。然而，主要是，我希望能够重复输入一个字符串，扫描要替换的字符，扫描要替换的字符，并打印之前和之后的内容，直到输入的字符串
powershell - powershell Stop-Service 和 NET-STOP 有什么区别
在 powershell 中，我看到了多种停止服务的方法更现代的方式 Stop-Service wuauserv 而更传统的方式 NET STOP WUAUSERV 遗留方式 is much mor
java - 仅在句子中匹配秒词(如果是 STOP 或 stop 或 StOppp)
所以问题是我需要一个正则表达式，只有当它的 stop 也意味着 stopp 或 sstoooppp 时，它才会匹配第二个单词> 后跟一个空格。我需要得到这个词，我找不到任何正则表达式来做到这一点，因为
jQuery:将 .delay() 与 .stop() 一起使用会使 .stop() 无用。为什么？
我正在做这样的事情 http://jsfiddle.net/8ErSL/2/ 当您将鼠标悬停在任何文本框 (div) 上时，其中会出现一个小图标。我想阻止图标的淡入淡出效果在我不小心将鼠标悬停在 d
Android MediaRecorder Stop() 函数给出错误 E/MediaRecorder : stop failed: -1007
这段代码在 Debug模式下工作得很好，但当不是 Debug模式时它总是抛出运行时异常。 mMediaRecorder.stop(); 根据 Java 文档: Stops recordin
使用 MediaRecorder#stop 时出现 java.lang.RuntimeException : stop failed.
这是我的full code ，这里是my project ，当我在 #onCreate 中使用 MediaRecorder#stop 时，它会引发 java.lang.RuntimeException
c# - PowerShell Stop-Job/Stop Job() 需要 2 分钟才能停止作业
我使用 C# 编写了一个库并在 PowerShell 脚本中使用它。 C# 库将大量数据加载到数据库中。我正在使用 Start-Job 来启动该过程，并且我正在监视一个文件是否有错误。但是，我发现即
algorithm - 跟进: Find the optimal sequence of stops where the number of stops are fixed
我正在尝试编写以下问题的代码: 在 a0, a1, ..., an 处有 n 个酒店，使得 0 dp(k)+(ai-ak)^2) dp(i) = dp(k)+(ai-ak)^2)
Python 异步 : event loop does not seem to stop when stop method is called
我有一个简单的测试，我使用 run_forever 方法运行 Python asyncio 事件循环，然后立即在另一个线程中停止它。但是，事件循环似乎并没有终止。我有以下测试用例: import as
java - EC2 Java StartInstancesRequest 从 "pending"变为 "stopping"再到 "stopped"
我有以下情况: 专用租赁 m4.large 运行 RHEL6 的 EC2 实例使用 AWS 控制台手动启动它效果很好尝试启动它的 Lambda 函数(用 Java 编写)失败，因为实例状态为:已停
java - Yajsw Stop INFO Log message while start stop daemon 在linux下
我正在使用 Yajsw 将我的应用程序作为守护进程运行。对于状态调用，我希望看到“正在运行”或“已停止”，但我收到的消息如下所示 SW043305-SRV01:/etc/init.d # ./tes
tomcat - service tomcat start/stop 和 ./catalina.sh run/stop 有什么区别
在 Tomcat 或 TomEE 中，service tomcat start/stop 和 ./catalina.sh run/stop 有什么区别？他们做的事情完全一样吗？最佳答案 catal
C++ 蛇克隆 : timer function ignores given stop time and stops at it's own fixed time
我正在尝试使用 C++ 和 OpenGL/GLUT 制作一个 Snake 克隆。然而，我一直在编程允许输入 Action 之间的短时间间隔时遇到问题。我已经尝试了一些计时方法，最后我为它创建了一个类(
java - Server Stop responding because of [Pool-Cleaner] :Tomcat Connection Pool but has failed to stop it. 这很可能造成内存泄漏
问题: org.apache.catalina.loader.WebappClassLoader - The web application [/…] appears to have started
c++ - Qt : How can I make a layout section stop expanding once the widgets its hosting stops expanding too
我正在尝试以下实验: 我有两个QpushButtons，比如PushA 和PushB。现在 PushA 在 QHBoxLayout 中，PushB 也在它自己的 QHBoxLayout 中。这两个水平
linux - 无法启动 : The running command stopped because the preference variable "ErrorActionPreference" or common parameter is set to Stop
我已经在我的 windows 10 机器上安装了 Docker for Windows。当我尝试从“windows 容器”“切换到 linux 容器”时，出现错误。 Unable to start:
android - java.lang.RuntimeException : stop failed at android. 媒体.MediaRecorder.stop(MediaRecorder.java)
我在我的应用程序中集成了摄像头。当用户单击捕获按钮时，我隐藏了工具栏，以便摄像头预览屏幕尺寸增加。这会导致应用程序在停止在线录制时崩溃 - mMediaRecorder.stop(); 。 java.
R stop() 函数中的域参数有什么作用？
运行功能时 stop("m Sys.setenv(LANG = "fr") > 2 + x Erreur : objet 'x' introuvable > Sys.setenv(LANG = "en
ubuntu - 代客状态显示 "is stopped"
我有一个 Windows 10 内部版本，我正在尝试安装 cpriego/valet-linux使用 wsl2 我已经安装了 composer、php 和所有其他的要求。现在当我做 valet st

首页

博学

6Ren·AI

商城

python - TfidfVectorizer : ValueError: not a built-in stop list: russian