gpt4 book ai didi

python - 样本数量不一致是什么意思?

转载 作者:行者123 更新时间:2023-11-30 09:44:23 27 4
gpt4 key购买 nike

我正在使用 scikit 的逻辑回归,但我不断收到消息:

Found input variables with inconsistent numbers of samples: [90000, 5625]

在下面的代码中,我删除了其中包含文本的列,然后将日期拆分为训练集和测试集。

import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
from scipy import stats
from sklearn import datasets, linear_model
from sklearn.model_selection import train_test_split

dataset = pd.read_csv("/Users/An/Desktop/data/telco.csv", na_values = ' ')
dataset = dataset.dropna(axis = 0)

dataset = dataset.replace({'Yes':1, 'Fiber optic': 1, 'DSL':1, 'No':0, 'No phone service':0, 'No internet service':0})
dataset = dataset.drop('Contract', axis =1)
dataset = dataset.drop('PaymentMethod',axis =1)
dataset = dataset.drop('customerID',axis =1)
dataset = dataset.drop('gender',axis =1)

for i in list(['tenure', 'MonthlyCharges', 'TotalCharges']):
sd = np.std(dataset[i])
mean = np.mean(dataset[i])
dataset[i] = (dataset[i] - mean) / sd

total = pd.DataFrame(dataset)
data_train, data_test = train_test_split(total, test_size=0.2)
data_train = data_train.values
data_test = data_test.values

from sklearn.linear_model import LogisticRegression

clf = LogisticRegression(C=1e9)
clf = clf.fit(data_train[:,0:16], data_train[:,16])
print clf.intercept_, clf.coef_

有人可以解释一下该错误消息的含义并帮助我找出为什么会收到该错误消息吗?

最佳答案

在倒数第二行中,data_train.reshape(-1, 1) 导致了您的问题。删除reshape会给你带来好处。

原因

LogisticRegression.fit 期望 xy 具有相同的 shape[0],但您是将 x(n, m) reshape 为 (n*m, 1)

这是复制的形状:

import numpy as np

df = np.ndarray((2000,10))
x, y = df[:, 2:9], df[:, 9]
x.shape, y.shape # << what you should give to `clf.fit`
# ((2000, 7), (2000, ))

x.reshape(-1, 1).shape, y.shape # << what you ARE giving to `clf.fit`,
# ((14000, 1), (2000,)) # << which is causing the problem

关于python - 样本数量不一致是什么意思?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54401576/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com