gpt4 book ai didi

Python 监督机器学习

转载 作者:行者123 更新时间:2023-11-28 17:29:35 25 4
gpt4 key购买 nike

我试图了解如何使用 scikit 进行监督式机器学习,所以我整理了一些属于两个不同集合的数据:集合 A 和集合 B。我在集合 A 中有 18 个元素和集合 B 中的 18 个元素。每个元素都有三个变量。见下文:

#SetA
Variable1A = [ 3,4,4,5,4,5,5,6,7,7,5,4,5,6,4,9,3,4]
Variable2A = [ 5,4,4,3,4,5,4,5,4,3,4,5,3,4,3,4,4,3]
Variable3A = [ 7,8,4,5,6,7,3,3,3,4,4,9,7,6,8,6,7,8]


#SetB
Variable1B = [ 7,8,11,12,7,9,8,7,8,11,15,9,7,6,9,9,7,11]
Variable2B = [ 1,2,3,3,4,2,4,1,0,1,2,1,3,4,3,1,2,3]
Variable3B = [ 12,18,14,15,16,17,13,13,13,14,14,19,17,16,18,16,17,18]

我将如何使用 scikit使用监督机器学习,这样当我引入新的 setA 和 setB 数据时,它可以尝试识别哪些新数据属于 setA 或 setB。

对数据集的道歉很小,而且是“编造的”。我只想在其他数据集上应用使用 scikit 的相同方法。

最佳答案

我认为这是一个很好的问题,如果您觉得问题不够清楚,请不要担心。监督学习可用于将实例(数据行)分类为多个类别(或者在您的情况下仅为 2 组)。您在上面的示例中缺少的是一个变量,该变量表示第 1 行属于哪个集合。

import numpy as np # numpy will help us to concatenate the columns into a 2-dimensional array
# so instead of hiving 3 separate arrays, we have 1 array with 3 columns and 18 rows

Variable1A = [ 3,4,4,5,4,5,5,6,7,7,5,4,5,6,4,9,3,4]
Variable2A = [ 5,4,4,3,4,5,4,5,4,3,4,5,3,4,3,4,4,3]
Variable3A = [ 7,8,4,5,6,7,3,3,3,4,4,9,7,6,8,6,7,8]

#our target variable for A

target_variable_A=[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]

Variable1B = [ 7,8,11,12,7,9,8,7,8,11,15,9,7,6,9,9,7,11]
Variable2B = [ 1,2,3,3,4,2,4,1,0,1,2,1,3,4,3,1,2,3]
Variable3B = [ 12,18,14,15,16,17,13,13,13,14,14,19,17,16,18,16,17,18]

# target variable for B
target_variable_B=[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]

#lets create a dataset C with only 4 rows that we need to predict if belongs to "1" which is data set A or "0" which is dataset B

Variable1C = [ 7,4,4,12]
Variable2C = [ 1,4,4,3]
Variable3C = [ 12,8,4,15]

#make the objects 2-dimenionsal arrays (so 1 array with X rows and 3 columns-variables)
Dataset_A=np.column_stack((Variable1A,Variable2A,Variable3A))
Dataset_B=np.column_stack((Variable1B,Variable2B,Variable3B))
Dataset_C=np.column_stack((Variable1C,Variable2C,Variable3C))

print(" dataset A rows ", Dataset_A.shape[0]," dataset A columns ", Dataset_A.shape[1] )
print(" dataset B rows ", Dataset_B.shape[0]," dataset B columns ", Dataset_B.shape[1] )
print(" dataset C rows ", Dataset_C.shape[0]," dataset C columns ", Dataset_C.shape[1] )

##########Prints ##########
#(' dataset A rows ', 18L, ' dataset A columns ', 3L)
#(' dataset B rows ', 18L, ' dataset B columns ', 3L)
#(' dataset C rows ', 4L, ' dataset C columns ', 3L)

# since now we have an identification that tells us if it belongs to A or B (e.g. 1 or 0) we can append the new sets together
Dataset_AB=np.concatenate((Dataset_A,Dataset_B),axis=0) # this creates a set with 36 rows and 3 columns
target_variable_AB=np.concatenate((target_variable_A,target_variable_B),axis=0)

print(" dataset AB rows ", Dataset_AB.shape[0]," dataset Ab columns ", Dataset_AB.shape[1] )
print(" target Variable rows ", target_variable_AB.shape[0])

##########Prints ##########
#(' dataset AB rows ', 36L, ' dataset Ab columns ', 3L)
#(' target Variable rows ', 36L)

#now we will select the most common supervised scikit model - Logistic Regression
from sklearn.linear_model import LogisticRegression
model=LogisticRegression() # we create an instance of the model

model.fit(Dataset_AB,target_variable_AB) # the model learns to distinguish between A and B (1 or 0)

#now we make predictions for the new dataset C

predictions_for_C=model.predict(Dataset_C)
print(predictions_for_C)
# this will print
#[0 1 1 0]
# so first case belongs to set A , second to B, third to B and fourth to A

关于Python 监督机器学习,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35438540/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com