gpt4 book ai didi

python - 如何使用 train_test_split 在交叉验证中保持测试大小不变?

转载 作者:太空宇宙 更新时间:2023-11-04 05:28:18 24 4
gpt4 key购买 nike

我正在使用矩阵 X 和该矩阵 y 中每一行的标签。X 定义为:

df = pd.read_csv("./data/svm_matrix_0.csv", sep=',',header=None, encoding="ISO-8859-1")
df2 = df.convert_objects(convert_numeric=True)
X = df_2.values

y 定义为:

df = pd.read_csv('./data/Step7_final.csv', index_col=False, encoding="ISO-8859-1")  
y = df.iloc[:, 1].values

然后我将机器学习 SVM 应用到:

clf = svm.SVC(kernel='linear', C=1)    #specify classifier
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2) #splitting randomly the training and test data
clf.fit(X_train,y_train) #training of machine

现在,我想改变 X_train 的大小,并计算每个 X_train 值的训练和测试误差:

test_error = clf.score(X_test, y_test) 
train_error = clf.score(X_train, y_train)

X_train 应该增加大小(例如 15 个不同的值),然后这些值应该以以下形式存储在字典中:{X_train size: (test_error, train_error) }

我试过:

array = [0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9]
dicto = {}
for i in array:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = i)
clf.fit(X_train,y_train)
test = clf.score(X_test, y_test)
train = clf.score(X_train, y_train)
dicto[i] = test, train

print(dicto)

但它不起作用,因为我也在改变 X_test。有人知道如何调整我的代码,使其仅改变 X_train 的大小(以便在增加的总数据集大小下计算错误)吗?

最佳答案

你能做的就是先把测试数据分开...

X_train_prev, X_test_prev, y_train_prev, y_test_prev = train_test_split(X, y, test_size = 0.2)

现在运行 for 循环,改变火车的大小,但在 **以前的测试数据* 上进行测试

像这样——

array = [0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9]
dicto = {}
for i in array:
X_train, _, y_train, _ = train_test_split(X, y, test_size = i)
clf.fit(X_train,y_train)
#use the previous test data...
test = clf.score(X_test_prev, y_test_prev)
train = clf.score(X_train, y_train)
dicto[i] = test, train

print(dicto)

但请注意,由于数据是随机的,我所做的可能会降低未见数据中的模型指标得分,我们也在污染测试数据。所以你可以做些什么来避免它在训练数据上 split ,这样你的测试数据就保持分离!!

像这样(for循环中的行)-

X_train, _, y_train, _ = train_test_split(X_train_prev, y_train_prev, test_size = i)

关于python - 如何使用 train_test_split 在交叉验证中保持测试大小不变?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37975184/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com