gpt4 book ai didi

Python/Sklearn - 值错误 : could not convert string to float

转载 作者:行者123 更新时间:2023-11-28 22:23:01 24 4
gpt4 key购买 nike

我正在尝试使用 10 倍 CV 在我的数据集中运行 kNN 分类器。我对 WEKA 中的模型有一些经验,但很难将其转移到 Sklearn。

下面是我的代码

filename = 'train4.csv'
names = ['attribute names are here']

df = pandas.read_csv(filename, names=names)

num_folds = 10
kfold = KFold(n_splits=10, random_state=7)
model = KNeighborsClassifier()
results = cross_val_score(model, df.drop('mix1_instrument', axis=1), df['mix1_instrument'], cv=kfold)
print(results.mean())

我收到这个错误

 ValueError: could not convert string to float: ''

如何转换此属性?这包含对我的实例进行分类的有用信息,转换会对此产生影响吗?

我认为有两个“对象”属性需要转换名为“class1”和“class2”

下面的示例数据...

{
'temporalCentroid': {
0: 'temporalCentroid',
1: '1.67324',
2: '1.330722',
3: '0.786984',
4: '1.850129'
},
'LogSpecCentroid': {
0: 'LogSpecCentroid',
1: '-1.043802',
2: '-0.82943',
3: '-2.441297',
4: '-0.837145'
},
'LogSpecSpread': {
0: 'LogSpecSpread',
1: '0.747558',
2: '1.378373',
3: '0.667634',
4: '1.238404'
},
'MFCC1': {
0: 'MFCC1',
1: '3.502117',
2: '6.697601',
3: '4.011488',
4: '0.823614'
},
'MFCC2': {
0: 'MFCC2',
1: '-9.208897',
2: '-9.741549',
3: '15.27665',
4: '-15.22256'
},
'MFCC3': {
0: 'MFCC3',
1: '-2.334097',
2: '-9.868089',
3: '0.802509',
4: '-4.978688'
},
'MFCC4': {
0: 'MFCC4',
1: '-9.013086',
2: '0.609091',
3: '2.50685',
4: '-2.489553'
},
'MFCC5': {
0: 'MFCC5',
1: '4.847481',
2: '1.733307',
3: '0.10459',
4: '1.066615'
},
'MFCC6': {
0: 'MFCC6',
1: '-4.770421',
2: '-5.381835',
3: '-0.260118',
4: '-1.020861'
},
'MFCC7': {
0: 'MFCC7',
1: '-3.362488',
2: '-1.261088',
3: '0.593255',
4: '-2.007349'
},
'MFCC8': {
0: 'MFCC8',
1: '-9.527529',
2: '-3.809237',
3: '-0.362287',
4: '-8.938164'
},
'MFCC9': {
0: 'MFCC9',
1: '-9.629579',
2: '1.486923',
3: '-2.957592',
4: '-2.324424'
},
'MFCC10': {
0: 'MFCC10',
1: '1.848685',
2: '-3.938455',
3: '-1.884439',
4: '-2.535579'
},
'MFCC11': {
0: 'MFCC11',
1: '-2.311295',
2: '-2.159865',
3: '-0.827179',
4: '0.638553'
},
'MFCC12': {
0: 'MFCC12',
1: '-7.696675',
2: '-3.138412',
3: '-0.605056',
4: '-1.116259'
},
'MFCC13': {
0: 'MFCC13',
1: '10.35572',
2: '9.095669',
3: '6.426399',
4: '15.04535'
},
'MFCCMin': {
0: 'MFCCMin',
1: '-9.629579',
2: '-9.868089',
3: '-2.957592',
4: '-15.22256'
},
'MFCCMax': {
0: 'MFCCMax',
1: '10.35572',
2: '9.095669',
3: '15.27665',
4: '15.04535'
},
'MFCCSum': {
0: 'MFCCSum',
1: '-37.300064',
2: '-19.675939',
3: '22.82507',
4: '-23.059305'
},
'MFCCAvg': {
0: 'MFCCAvg',
1: '-2.869235692',
2: '-1.513533769',
3: '1.755774615',
4: '-1.773792692'
},
'MFCCStd': {
0: 'MFCCStd',
1: '6.409842944',
2: '5.558499123',
3: '4.756836281',
4: '6.76039911'
},
'Energy': {
0: 'Energy',
1: '-2.96148',
2: '-3.522993',
3: '-3.409359',
4: '-2.235853'
},
'ZeroCrossings': {
0: 'ZeroCrossings',
1: '128',
2: '188',
3: '43',
4: '288'
},
'SpecCentroid': {
0: 'SpecCentroid',
1: '284.0513',
2: '414.8489',
3: '102.2096',
4: '405.1262'
},
'SpecSpread': {
0: 'SpecSpread',
1: '207.5526',
2: '350.7937',
3: '53.52178',
4: '360.0353'
},
'Rolloff': {
0: 'Rolloff',
1: '263.7817',
2: '783.2703',
3: '129.1992',
4: '912.4695'
},
'Flux': {
0: 'Flux',
1: '0',
2: '0',
3: '0',
4: '0'
},
'bandsCoefMin': {
0: 'bandsCoefMin',
1: '-0.224957',
2: '-0.247903',
3: '-0.22283',
4: '-0.232534'
},
'bandsCoefMax': {
0: 'bandsCoefMax',
1: '-0.074945',
2: '-0.113654',
3: '-0.062254',
4: '-0.080883'
},
'bandsCoefSum1': {
0: 'bandsCoefSum1',
1: '-5.575428',
2: '-5.524777',
3: '-5.511125',
4: '-5.532536'
},
'bandsCoefAvg': {
0: 'bandsCoefAvg',
1: '-0.168952364',
2: '-0.167417485',
3: '-0.167003788',
4: '-0.167652606'
},
'bandsCoefStd': {
0: 'bandsCoefStd',
1: '0.042580181',
2: '0.048429973',
3: '0.049881374',
4: '0.0475839'
},
'bandsCoefSum': {
0: 'bandsCoefSum',
1: '382.5963',
2: '360.9232',
3: '384.3541',
4: '368.9903'
},
'prjmin': {
0: 'prjmin',
1: '-0.999362',
2: '-0.999719',
3: '-0.988315',
4: '-0.999421'
},
'prjmax': {
0: 'prjmax',
1: '0.023797',
2: '0.009596',
3: '0.028112',
4: '0.024612'
},
'prjSum': {
0: 'prjSum',
1: '-0.99911',
2: '-1.006792',
3: '-1.084054',
4: '-1.002478'
},
'prjAvg': {
0: 'prjAvg',
1: '-0.030276061',
2: '-0.030508848',
3: '-0.032850121',
4: '-0.030378121'
},
'prjStd': {
0: 'prjStd',
1: '0.174082468',
2: '0.174040569',
3: '0.173600498',
4: '0.174064118'
},
'LogAttackTime': {
0: 'LogAttackTime',
1: '0.365883',
2: '-0.35427',
3: '-0.669283',
4: '-0.026181'
},
'HamoPkMin': {
0: 'HamoPkMin',
1: '0',
2: '0',
3: '0',
4: '0'
},
'HamoPkMax': {
0: 'HamoPkMax',
1: '1.025473',
2: '1.05761',
3: '0.986766',
4: '0.957316'
},
'HamoPkSum': {
0: 'HamoPkSum',
1: '14.391206',
2: '20.306125',
3: '9.727358',
4: '14.772449'
},
'HamoPkAvg': {
0: 'HamoPkAvg',
1: '0.513971643',
2: '0.72521875',
3: '0.347405643',
4: '0.527587464'
},
'HamoPkStd': {
0: 'HamoPkStd',
1: '0.376622124',
2: '0.325929503',
3: '0.388971641',
4: '0.381693476'
},
'class1': {
0: 'class1',
1: 'aerophone',
2: 'aerophone',
3: 'chordophone',
4: 'aerophone'
},
'class2': {
0: 'class2',
1: 'aero_single-reed',
2: 'aero_lip-vibrated',
3: 'chrd_simple',
4: 'aero_single-reed'
},
'mix1_instrument': {
0: 'mix1_instrument',
1: 'Saxophone',
2: 'Trumpet',
3: 'Piano',
4: 'Clarinet'
}
}

谢谢

最佳答案

这是一个小演示:

来源 DF:

In [43]: df
Out[43]:
Energy HamoPkStd class1 class2 mix1_instrument
0 -2.961480 14.391206 aerophone aero_single-reed Saxophone
1 -3.522993 20.306125 chordophone aero_lip-vibrated Trumpet
2 -3.409359 9.727358 aerophone chrd_simple Piano

标签编码:

In [44]: %paste
from sklearn.preprocessing import LabelBinarizer, LabelEncoder

str_cols = df.columns[df.columns.str.contains('(?:class|instrument)')]
clfs = {c:LabelEncoder() for c in str_cols}

for col, clf in clfs.items():
df[col] = clfs[col].fit_transform(df[col])
## -- End pasted text --

结果 - 所有文本/字符串列都已转换为数字,因此我们可以将其提供给神经网络:

In [45]: df
Out[45]:
Energy HamoPkStd class1 class2 mix1_instrument
0 -2.961480 14.391206 0 1 1
1 -3.522993 20.306125 1 0 2
2 -3.409359 9.727358 0 2 0

逆变换:

In [48]: clfs['class1'].inverse_transform(df['class1'])
Out[48]: array(['aerophone', 'chordophone', 'aerophone'], dtype=object)

In [49]: clfs['mix1_instrument'].inverse_transform(df['mix1_instrument'])
Out[49]: array(['Saxophone', 'Trumpet', 'Piano'], dtype=object)

关于Python/Sklearn - 值错误 : could not convert string to float,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47312695/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com