gpt4 book ai didi

python - 尝试从 csv 绘制数据集而不是生成数据集时,与 scikit-learn fit() 方法不匹配

转载 作者:太空宇宙 更新时间:2023-11-03 18:39:39 26 4
gpt4 key购买 nike

最初我使用 scikit-learn snipit 来生成我的数据集:

# Create a random dataset
rng = np.random.RandomState(1)
X = np.sort(5 * rng.rand(80, 1), axis=0)
y = np.sin(X).ravel()
y[::5] += 3 * (0.5 - rng.rand(16))

然后我切换到 .csv 文件:

“X”,“Y”-0.8,7.2-0.7,6.90.4,6.42.5,62.9,5.83.2,5.83.6,5.63.9,4.74.2,5.84.3,5.25.4,​​4.96,4.9

所以现在我想我应该读取 csv 并绘制一个图:

import csv
import numpy as np

#dataset
# read in the data as rows
with open('my.csv', 'rb') as csvfile:
h_reader = csv.reader( csvfile, delimiter =',',quotechar ='"')

# First row contains feature names
feature_names = _reader.next()

X, y = [], []
for row in _reader:
X.append(row[0])
y.append(row[1])

feature_names = np.array(feature_names)
X = np.array( X)
y = np.array( y)

print type(X)
print type(y)

# Fit regression model
from sklearn.ensemble import RandomForestRegressor
rfr_1 = RandomForestRegressor(n_estimators=10, max_depth=2)
rfr_2 = RandomForestRegressor(n_estimators=10, max_depth=5)
print X
print y

rfr_1.fit(X, y)
rfr_2.fit(X, y)

# Predict
X_test = np.arange(0.0, 5.0, 0.01)[:, np.newaxis]
y_1 = rfr_1.predict(X_test)
y_2 = rfr_2.predict(X_test)

# Plot the results
import pylab as pl
pl.figure()
pl.scatter(X, y, c="k", label="data")
pl.plot(X_test, y_1, c="g", label="max_depth=2", linewidth=2)
pl.plot(X_test, y_2, c="r", label="max_depth=5", linewidth=2)
pl.xlabel("X")
pl.ylabel("Y")
pl.title("Regression")
pl.legend()
pl.show()

当我期待图表时,我得到以下输出:

<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
['-0.8' '-0.7' '0.4' '2.5' '2.9' '3.2' '3.6' '3.9' '4.2' '4.3' '5.4' '6'
'6' '6' '6.2' '6.3' '6.9' '7' '7.4' '7.5' '7.5' '7.6' '8' '8.5' '9.1']
['7.2' '6.9' '6.4' '6' '5.8' '5.8' '5.6' '4.7' '5.8' '5.2' '4.9' '4.9'
'4.3' '4.4' '4.5' '4.6' '3.7' '3.9' '4.2' '4' '3.9' '3.5' '4' '3.6' '3.1']
Traceback (most recent call last):
File "test3.py", line 33, in <module>
rfr_1.fit(X, y)
File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.14.1-py2.7-linux-
i686.egg /sklearn/ensemble/forest.py", line 260, in fit
n_samples, self.n_features_ = X.shape
ValueError: need more than 1 value to unpack

在读取生成数据集的 .csv 时我做错了什么?

谢谢!

最佳答案

您需要将Xy转换为正确的浮点类型,并将X的形状调整为正确的尺寸。

我注意到随机数据集中 X 的维度为 (80, 1),但 X 长度的输出为 25。

此外,我看到您在代码中使用了 numpy,因此您可以通过 numpy 使用更紧凑的代码保存和加载文件,如下所示,而无需使用 csv 模块。

*保存生成的数据

# Create a random dataset
......
np.savetxt("my.csv", np.column_stack((X,y)), delimiter=",")

*加载数据

# load data
data = np.loadtxt("my.csv", delimiter=",")
X = np.resize(data[:, 0], (80, 1))
y = data[:, 1]

# Fit regression model
......

关于python - 尝试从 csv 绘制数据集而不是生成数据集时,与 scikit-learn fit() 方法不匹配,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20793312/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com