gpt4 book ai didi

python - Scikit-learn 的 DecisionTreeClassifier 的 fit 方法将 ValueError : Couldn't broadcast input array from shape (10, 35) 赋予形状 (10)

转载 作者:行者123 更新时间:2023-11-30 09:58:30 26 4
gpt4 key购买 nike

所以我试图制作一个决策树,我的目标是数组 [0, 1] (二进制“NO”或“YES”),我的输入训练集是三维数组,第一个元素全部为“NO”示例( 10) 各有 35 个特征,与"is"相同。但我不断收到此错误。

    file1 = open(file1.txt) # examples of 'No' class
file2 = open(file2.txt) # examples of 'Yes' class
x = vectorizer.fit_transform(file1)
y = vectorizer.fit_transform(file2)

x_array = x.toarray()
y_array = y.toarray()


x_train, x_test, y_train, y_test = train_test_split(x_array, y_array,
test_size=0.2)
target = [0, 1] # 0 encoded as 'No' and 1 as 'Yes
train = [x_train, y_train]

decisiontree = DecisionTreeClassifier(random_state=0, max_depth=5)
decisiontree = decisiontree.fit(train, target)

感谢您的帮助。

编辑:我正在从 txt 文件加载数据,它是文本数据,我尝试打印数组的某些部分,这里是

[[0 0 0 ... 0 0 0]    
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]]

最佳答案

我认为原因是您对 decisiontree.fit 中的 fit 方法感到困惑。

对于decisiontree.fit(X,Y),它期望X是数据点,Y是标签。也就是说,如果 X 的形状为 N x 32,则 Y 的形状应为 N(其中 >N 是数据点的数量)。

您应该将x_arrayy_array合并为整个数据集,将其拆分,然后使用相应的标签进行fit

考虑以下因素:

# from sklearn.model_selection import train_test_split
# from sklearn.tree import DecisionTreeClassifier
import numpy as np

file1 = open(file1.txt)
file2 = open(file2.txt)
x = vectorizer.fit_transform(file1)
y = vectorizer.fit_transform(file2)

x_array = x.toarray()
y_array = y.toarray()

# ------------------------------------------------------------
# combine the positive and negative examples
data = np.concatenate([x_array, y_array], axis=0)
# create corresponding labels (based on the data's length)
labels = np.concatenate([np.zeros(x_array.shape[0]),
np.ones(y_array.shape[0])], axis=0)

# split into train and test set
train_data, test_data, train_labels, test_labels = train_test_split(
data, labels, test_size=0.2)

decisiontree = DecisionTreeClassifier(random_state=0, max_depth=5)
decisiontree = decisiontree.fit(train_data, train_labels)

# ------------------------------------------------------------
# this is how you can test model performance with the test set
correct_predictions = np.count_nonzero(
decisiontree.predict(test_data) == test_labels
)

print("Correct prediction in test set: {}/{}".format(correct_predictions,
test_labels.shape[0]))

关于python - Scikit-learn 的 DecisionTreeClassifier 的 fit 方法将 ValueError : Couldn't broadcast input array from shape (10, 35) 赋予形状 (10),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60032563/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com