gpt4 book ai didi

python - 多标签分类

转载 作者:太空宇宙 更新时间:2023-11-03 21:38:59 24 4
gpt4 key购买 nike

我有一个看起来像这样的数据集

      A         B         C         D       sex        weight
0.955136 0.802256 0.317182 -0.708615 female normal
0.463615 -0.860053 -0.136408 -0.892888 male obese
-0.855532 -0.181905 -1.175605 1.396793 female overweight
-1.236216 -1.329982 0.531241 2.064822 male underweight
-0.970420 -0.481791 -0.995313 0.672131 male obese

我想,给定特征 X= [A,B,C,D] 和标签 y=[sex, Weight] ,训练一个机器学习模型能够根据特征 A、B、C 和 D 来预测一个人的性别和体重。这是如何实现的?您能否推荐任何可以帮助我实现这一目标的图书馆或阅读 Material ?为了更方便测试,可以使用以下代码人工生成数据集:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(100, 4), columns=list('ABCD'))
df['sex'] = [np.random.choice(['male', 'female']) for x in range(len(df))]
df['weight'] = [np.random.choice(['underweight',
'normal', 'overweight', 'obese']) for x in range(len(df)) ]

最佳答案

您需要从字符串值到整数的固定标签:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(100, 4), columns=list('ABCD'))
#fixed labels
df['sex'] = [np.random.choice(['0', '1']) for x in range(len(df))]
df['weight'] = [np.random.choice(list(range(4))) for x in range(len(df))]

% matplotlib inline
from pandas import read_csv, DataFrame
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.svm import SVR
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score
from sklearn.cross_validation import train_test_split
import matplotlib.pyplot as plt
trg = df[['sex','weight']]
trn = df.drop(['sex','weight'], axis=1)
#list of different models
models = [LinearRegression(),
RandomForestRegressor(n_estimators=100, max_features ='sqrt'),
SVR(kernel='linear'),
LogisticRegression()
]

Xtrn, Xtest, Ytrn, Ytest = train_test_split(trn, trg, test_size=0.4)
TestModels = DataFrame()
tmp = {}
#for each model in list
for model in models:
#get name
m = str(model)
tmp['Model'] = m[:m.index('(')]
#for each columns from result list
for i in range(Ytrn.shape[1]):
#learning model
model.fit(Xtrn, Ytrn.iloc[:,i])
#calculate coefficient of determination
tmp['R2_Y%s'%str(i+1)] = r2_score(Ytest.iloc[:,0], model.predict(Xtest))
#write data and final datarame
TestModels = TestModels.append([tmp])
#make an index by model name
TestModels.set_index('Model', inplace=True)

fig, axes = plt.subplots(ncols=2, figsize=(10,4))
TestModels.R2_Y1.plot(ax=axes[0], kind='bar', title='R2_Y1')
TestModels.R2_Y2.plot(ax=axes[1], kind='bar', color='green', title='R2_Y2')

关于python - 多标签分类,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53030262/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com