gpt4 book ai didi

python - pickle .PicklingError : args[0] from __newobj__ args has the wrong class with hadoop python

转载 作者:可可西里 更新时间:2023-11-01 14:22:44 25 4
gpt4 key购买 nike

我正在尝试通过spark删除停用词,代码如下

from nltk.corpus import stopwords
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession

sc = SparkContext('local')
spark = SparkSession(sc)
word_list=["ourselves","out","over", "own", "same" ,"shan't" ,"she", "she'd", "what", "the", "fuck", "is", "this","world","too","who","who's","whom","yours","yourself","yourselves"]

wordlist=spark.createDataFrame([word_list]).rdd

def stopwords_delete(word_list):
filtered_words=[]
print word_list



for word in word_list:
print word
if word not in stopwords.words('english'):
filtered_words.append(word)



filtered_words=wordlist.map(stopwords_delete)
print(filtered_words)

我得到如下错误:

pickle.PicklingError: args[0] from newobj args has the wrong class

我不知道为什么,谁能帮帮我。
提前致谢

最佳答案

与上传停用词模块有关。作为在函数本身中导入停用词库的解决方法。请参阅下面链接的类似问题。我遇到了同样的问题,此解决方法解决了该问题。

    def stopwords_delete(word_list):
from nltk.corpus import stopwords
filtered_words=[]
print word_list

Similar Issue

我会推荐 from pyspark.ml.feature import StopWordsRemover 作为永久修复。

关于python - pickle .PicklingError : args[0] from __newobj__ args has the wrong class with hadoop python,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44911539/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com