gpt4 book ai didi

python - TfidfVectorizer 出错但 CountVectorizer 正常

转载 作者:行者123 更新时间:2023-12-05 01:11:36 31 4
gpt4 key购买 nike

我一整天都在做这个,但没有运气

我设法在一行 TfidfVectorizer 中消除了问题

这是我的工作代码

from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer()
vectorizer.fit(xtrain)

X_train_count = vectorizer.transform(xtrain)
X_test_count = vectorizer.transform(xval)
X_train_count


from keras.models import Sequential
from keras import layers

input_dim = X_train_count.shape[1] # Number of features

model = Sequential()
model.add(layers.Dense(10, input_dim=input_dim, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))


model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.summary()

history = model.fit(X_train_count, ytrain,
epochs=10,
verbose=False,
validation_data=(X_test_count, yval),
batch_size=10)

但是当我换成

from sklearn.feature_extraction.text import TfidfVectorizer

#TF-IDF initializer
vectorizer = TfidfVectorizer(max_df=0.8, max_features=1000)

vectorizer.fit(xtrain)

X_train_count = vectorizer.transform(xtrain)
X_test_count = vectorizer.transform(xval)
X_train_count


from keras.models import Sequential
from keras import layers

input_dim = X_train_count.shape[1] # Number of features

model = Sequential()
model.add(layers.Dense(10, input_dim=input_dim, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))


model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.summary()

history = model.fit(X_train_count, ytrain,
epochs=10,
verbose=False,
validation_data=(X_test_count, yval),
batch_size=10)

唯一改变的是这 2 行

from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer(max_df=0.8, max_features=1000)

然后我得到这个错误

InvalidArgumentError: indices[1] = [0,997] is out of order. Many sparse ops require sorted indices.
Use tf.sparse.reorder to create a correctly ordered copy.

[Op:SerializeManySparse]

如何解决这个问题以及为什么会这样?

最佳答案

vectorizer.transform(...) 生成稀疏数组,这对 keras 不利。您只需要将其转换为一个简单的数组即可。这很容易实现:

vectorizer.transform(...).toarray()

关于python - TfidfVectorizer 出错但 CountVectorizer 正常,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62871108/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com