gpt4 book ai didi

python - SKlearn SGD 部分拟合

转载 作者:行者123 更新时间:2023-12-03 15:32:02 26 4
gpt4 key购买 nike

我在这里做错了什么?我有一个大数据集,我想使用 Scikit-learn 的 SGDClassifier 执行部分拟合

我做以下

from sklearn.linear_model import SGDClassifier
import pandas as pd

chunksize = 5
clf2 = SGDClassifier(loss='log', penalty="l2")

for train_df in pd.read_csv("train.csv", chunksize=chunksize, iterator=True):
X = train_df[features_columns]
Y = train_df["clicked"]
clf2.partial_fit(X, Y)

我收到错误

Traceback (most recent call last): File "/predict.py", line 48, in sys.exit(0 if main() else 1) File "/predict.py", line 44, in main predict() File "/predict.py", line 38, in predict clf2.partial_fit(X, Y) File "/Users/anaconda/lib/python3.5/site-packages/sklearn/linear_model/stochastic_gradient.py", line 512, in partial_fit coef_init=None, intercept_init=None) File "/Users/anaconda/lib/python3.5/site-packages/sklearn/linear_model/stochastic_gradient.py", line 349, in _partial_fit _check_partial_fit_first_call(self, classes) File "/Users/anaconda/lib/python3.5/site-packages/sklearn/utils/multiclass.py", line 297, in _check_partial_fit_first_call raise ValueError("classes must be passed on the first call " ValueError: classes must be passed on the first call to partial_fit.

最佳答案

请注意,分类器一开始并不知道类的数量,因此对于第一遍,您需要使用 np.unique(target) 来告诉类的数量,其中 target 是类列。因为您正在以块的形式读取数据,所以您需要确保您的第一个块具有类标签的所有可能值,因此它可以工作!因此,您的代码将是:

for train_df in pd.read_csv("train.csv", chunksize=chunksize, iterator=True):
X = train_df[features_columns]
Y = train_df["clicked"]
clf2.partial_fit(X, Y, classes=np.unique(Y))

关于python - SKlearn SGD 部分拟合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42147302/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com