gpt4 book ai didi

python-3.x - 计数向量化器和拟合函数的 Python 列表错误

转载 作者:行者123 更新时间:2023-11-30 09:46:23 24 4
gpt4 key购买 nike

请告诉我哪里出了问题以及如何纠正。

data = open(r"C:\Users\HS\Desktop\WORK\R\R DATA\g textonly2.txt").read()
labels, texts = [], []
#print(data)
for i, line in enumerate(data.split("\n")):
content = line.split()
#print(content)
if len(content) is not 0:
labels.append(content[0])
texts.append(content[1:])


# create a dataframe using texts and lables
trainDF = pandas.DataFrame()
trainDF['text'] = texts
trainDF['label'] = labels

# split the dataset into training and validation datasets
train_x, valid_x, train_y, valid_y = model_selection.train_test_split(trainDF['text'], trainDF['label'])

# label encode the target variable
encoder = preprocessing.LabelEncoder()
train_y = encoder.fit_transform(train_y)
valid_y = encoder.fit_transform(valid_y)

# create a count vectorizer object
count_vect = CountVectorizer(analyzer='word', token_pattern=r'\w{1,}')
count_vect.fit(trainDF['text'])

数据文件包含如下数据:

0 #\xdaltimahora Es tracta d'un aparell de Germanwings amb 152 passatgers a bord
0 Route map now being shared by http:
0 Pray for #4U9525 http:
0 Airbus A320 #4U9525 crash: \nFlight tracking data here: \nhttp

错误:

Traceback:
"C:\Program Files\Python36\python.exe" "C:/Users/HS/PycharmProjects/R/C/Text classification1.py"
Using TensorFlow backend.
Traceback (most recent call last):
File "C:/Users/HS/PycharmProjects/R/C/Text classification1.py", line 38, in <module>
count_vect.fit(trainDF['text'])
File "C:\Program Files\Python36\lib\site-packages\sklearn\feature_extraction\text.py", line 836, in fit
self.fit_transform(raw_documents)
File "C:\Program Files\Python36\lib\site-packages\sklearn\feature_extraction\text.py", line 869, in fit_transform
self.fixed_vocabulary_)
File "C:\Program Files\Python36\lib\site-packages\sklearn\feature_extraction\text.py", line 792, in _count_vocab
for feature in analyze(doc):
File "C:\Program Files\Python36\lib\site-packages\sklearn\feature_extraction\text.py", line 266, in <lambda>
tokenize(preprocess(self.decode(doc))), stop_words)
File "C:\Program Files\Python36\lib\site-packages\sklearn\feature_extraction\text.py", line 232, in <lambda>
return lambda x: strip_accents(x.lower())
AttributeError: 'list' object has no attribute 'lower'

Process finished with exit code 1

最佳答案

来自documentation :

fit(raw_documents, y=None)[source] Learn a vocabulary dictionary of all tokens in the raw documents.

Parameters: raw_documents : iterable

An iterable which yields either str, unicode or file objects.

Returns: self :

您收到错误 AttributeError: 'list' object has no attribute 'lower' 因为您给了它一个可迭代的列表(在本例中为 pd.Series)对象,而不是字符串的可迭代。

您应该能够使用 texts.append(' '.join(content[1:])) 来解决此问题而不是 texts.append(content[1:]):

for i, line in enumerate(data.split("\n")):
content = line.split()
#print(content)
if len(content) is not 0:
labels.append(content[0])
#texts.append(content[1:])
texts.append(' '.join(content[1:]))

关于python-3.x - 计数向量化器和拟合函数的 Python 列表错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52082477/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com