gpt4 book ai didi

python - 我该如何修复 “TypeError: expected string or bytes-like object”

转载 作者:行者123 更新时间:2023-12-03 08:46:00 25 4
gpt4 key购买 nike

大家好,我有一个文本文档(text_data)列表,我想对其进行矢量化处理,但是会抛出TypeError: expected string or bytes-like object错误。当我只调用preprocess(text_data)而没有tfidfconverter时,它可以工作。我找不到问题,有人可以帮我吗?

def preprocess(x):
documents = []
for sen in range(0, len(x)):

# Remove all the special characters
document = re.sub(r'\W', ' ', str(x[sen]))

# Remove all numbers
document = re.sub(r'[0-9]', ' ', document)

# Remove all underscores
document = re.sub(r'_', ' ', document)

# remove all single characters
document = re.sub(r'\s+[a-zA-Z]\s+', ' ', document)

# Remove single characters from the start
document = re.sub(r'\^[a-zA-Z]\s+', ' ', document)

# Substituting multiple spaces with single space
document = re.sub(r'\s+', ' ', document, flags=re.I)

# Converting to Lowercase
document = document.lower()

# Lemmatization
document = document.split()

document = ' '.join([stemmer.stem(word) for word in document])
documents.append(document)

x = documents

tfidfconverter = TfidfVectorizer(min_df=10, max_df=0.97, stop_words=text.ENGLISH_STOP_WORDS, preprocessor=preprocess)

追溯:
 Traceback (most recent call last):
File "C:/Users/Konrad/PycharmProjects/treffen/treffen.py", line 54, in <module>
tfidf_table = tfidfconverter.fit_transform(text_data).toarray()
File "C:\Users\Konrad\PycharmProjects\treffen\venv\lib\site-packages\sklearn\feature_extraction\text.py", line 1603, in fit_transform
X = super(TfidfVectorizer, self).fit_transform(raw_documents)
File "C:\Users\Konrad\PycharmProjects\treffen\venv\lib\site-packages\sklearn\feature_extraction\text.py", line 1032, in fit_transform
self.fixed_vocabulary_)
File "C:\Users\Konrad\PycharmProjects\treffen\venv\lib\site-packages\sklearn\feature_extraction\text.py", line 942, in _count_vocab
for feature in analyze(doc):
File "C:\Users\Konrad\PycharmProjects\treffen\venv\lib\site-packages\sklearn\feature_extraction\text.py", line 328, in <lambda>
tokenize(preprocess(self.decode(doc))), stop_words)
File "C:\Users\Konrad\PycharmProjects\treffen\venv\lib\site-packages\sklearn\feature_extraction\text.py", line 265, in <lambda>
return lambda doc: token_pattern.findall(doc)
TypeError: expected string or bytes-like object

Process finished with exit code 1

最佳答案

我看到的第一个问题是预处理程序期望返回一个字符串。其次,您不需要重建documents列表,因为预处理器函数将在培训文档列表中的每个字符串上调用。您可以尝试如下操作:

def preprocess(x):
# Remove all the special characters
document = re.sub(r'\W', ' ', str(x[sen]))

# Remove all numbers
document = re.sub(r'[0-9]', ' ', document)

# Remove all underscores
document = re.sub(r'_', ' ', document)

# remove all single characters
document = re.sub(r'\s+[a-zA-Z]\s+', ' ', document)

# Remove single characters from the start
document = re.sub(r'\^[a-zA-Z]\s+', ' ', document)

# Substituting multiple spaces with single space
document = re.sub(r'\s+', ' ', document, flags=re.I)

# Converting to Lowercase
document = document.lower()

# Lemmatization
document = document.split()
document = ' '.join([stemmer.stem(word) for word in document])

return document


tfidfconverter = TfidfVectorizer(min_df=10, max_df=0.97, stop_words=text.ENGLISH_STOP_WORDS, preprocessor=preprocess)

关于python - 我该如何修复 “TypeError: expected string or bytes-like object”,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54373900/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com