gpt4 book ai didi

python - dask.dataframe 上的 WordNetLemmatizer 错误, 'WordNetCorpusReader' 对象没有属性 '_LazyCorpusLoader__args'

转载 作者:行者123 更新时间:2023-12-01 08:16:07 28 4
gpt4 key购买 nike

我正在尝试对 dask 数据框进行词干分析

wnl = WordNetLemmatizer()

def lemmatizing(sentence):
stemSentence = ""

for word in sentence.split():
stem = wnl.lemmatize(word)
stemSentence += stem
stemSentence += " "

stemSentence = stemSentence.strip()

return stemSentence

df['news_content'] = df['news_content'].apply(stemming).compute()

但我收到以下错误:

AttributeError: 'WordNetCorpusReader' object has no attribute '_LazyCorpusLoader__args'

我已经尝试过推荐的方法 here ,但没有任何运气。

感谢您的帮助。

最佳答案

这是因为 wordnet 模块是“延迟读取”并且尚未评估。

使其工作的一个技巧是先使用 WordNetLemmatizer() 一次,然后再在 Dask 数据框中使用它,例如

>>> from nltk.stem import WordNetLemmatizer
>>> import dask.dataframe as dd

>>> df = dd.read_csv('something.csv')
>>> df.head()
text label
0 this is a sentence 1
1 that is a foo bar thing 0


>>> wnl = WordNetLemmatizer()
>>> wnl.lemmatize('cats') # Use it once first, to "unlazify" wordnet.
'cat'

# Now you can use it with Dask dataframe's .apply() function.
>>> lemmatize_text = lambda sent: [wnl.lemmatize(word) for word in sent.split()]

>>> df['lemmas'] = df['text'].apply(lemmatize_text)
>>> df.head()
text label lemmas
0 this is a sentence 1 [this, is, a, sentence]
1 that is a foo bar thing 0 [that, is, a, foo, bar, thing]
<小时/>

或者,您可以尝试pywsd:

pip install -U pywsd

然后在代码中:

>>> from pywsd.utils import lemmatize_sentence
Warming up PyWSD (takes ~10 secs)... took 9.131901025772095 secs.

>>> import dask.dataframe as dd

>>> df = dd.read_csv('something.csv')
>>> df.head()
text label
0 this is a sentence 1
1 that is a foo bar thing 0

>>> df['lemmas'] = df['text'].apply(lemmatize_sentence)
>>> df.head()
text label lemmas
0 this is a sentence 1 [this, be, a, sentence]
1 that is a foo bar thing 0 [that, be, a, foo, bar, thing]

关于python - dask.dataframe 上的 WordNetLemmatizer 错误, 'WordNetCorpusReader' 对象没有属性 '_LazyCorpusLoader__args',我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54969887/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com