gpt4 book ai didi

python - Python : ValueError: could not convert string to float: 'Isolated' when reading input file for applying random forest

转载 作者:行者123 更新时间:2023-12-02 10:56:26 29 4
gpt4 key购买 nike

我正在尝试将随机森林应用于以下输入文件:

gold,Program,Requirement,MethodType,Top,Side,CallersT,CallersN,CallersU,CallersCallersT,CallersCallersN,CallersCallersU,CalleesT,CalleesN,CalleesU,CalleesCalleesT,CalleesCalleesN,CalleesCalleesU
T,chess,1,Inner,T,T,Low,-1,-1,Low,-1,-1,High,-1,-1,-1,-1,Low,
N,chess,2,Inner,N,N,-1,Low,-1,-1,Low,-1,-1,High,-1,-1,-1,Low,
N,chess,3,Inner,N,N,-1,Low,-1,-1,Low,-1,-1,High,-1,-1,-1,Low,
N,chess,4,Root,N,N,-1,Low,-1,-1,Low,-1,-1,High,-1,-1,-1,Low,
N,chess,5,Inner,N,N,-1,Low,-1,-1,Low,-1,-1,High,-1,-1,-1,Low,
N,chess,6,Root,N,N,-1,Low,-1,-1,Low,-1,-1,High,-1,-1,-1,Low,
N,chess,7,Inner,N,N,-1,Low,-1,-1,Low,-1,-1,High,-1,-1,-1,Low,
N,chess,8,Inner,N,N,-1,Low,-1,-1,Low,-1,-1,High,-1,-1,-1,Low,
N,chess,1,Leaf,NU,N,-1,Low,-1,-1,Low,-1,-1,Medium,Medium,-1,High,High,
N,chess,2,Leaf,NU,N,-1,Low,-1,-1,Low,-1,-1,Medium,Medium,-1,High,High,
N,chess,3,Leaf,NU,N,-1,Low,-1,-1,Low,-1,-1,Medium,Medium,-1,High,High,
N,chess,4,Root,NU,N,-1,Low,-1,-1,Low,-1,-1,Medium,Medium,-1,High,High,
N,chess,5,Isolated,NU,N,-1,Low,-1,-1,Low,-1,-1,Medium,Medium,-1,High,High,
T,chess,6,Inner,TU,T,Low,-1,-1,Low,-1,-1,Medium,-1,Medium,High,-1,High,
T,chess,7,Isolated,TU,T,Low,-1,-1,Low,-1,-1,Medium,-1,Medium,High,-1,High,
N,chess,8,Inner,NU,N,-1,Low,-1,-1,Low,-1,-1,Medium,Medium,-1,High,High,
N,chess,1,Inner,TNU,N,-1,Low,-1,-1,-1,-1,Low,Low,High,Medium,-1,Medium,
N,chess,2,Inner,NU,N,-1,Low,-1,-1,-1,-1,-1,Medium,High,Low,Low,Medium,
N,chess,3,Inner,NU,N,-1,Low,-1,-1,-1,-1,-1,Medium,High,-1,Medium,Medium,
T,chess,4,Inner,NU,N,-1,Low,-1,-1,-1,-1,-1,Medium,High,Low,Low,Medium,
N,chess,5,Leaf,NU,N,-1,Low,-1,-1,-1,-1,-1,Medium,High,-1,Medium,Medium,
这是我用于应用随机森林的代码:
import pandas as pd
import numpy as np
from sklearn.feature_selection import SelectFromModel
from sklearn.model_selection import train_test_split
# Feature Scaling
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

X_train={}
X_test={}
y_train={}
y_test={}
dataset = pd.read_csv( 'dataExtended2.txt', sep= ',')
#convert T into 1 and N into 0
dataset['gold'] = dataset['gold'].astype('category').cat.codes
dataset['Program'] = dataset['Program'].astype('category').cat.codes
dataset['MethodType'] = dataset['MethodType'].astype('category').cat.codes
dataset['Top'] = dataset['Top'].astype('category').cat.codes
dataset['Side'] = dataset['Side'].astype('category').cat.codes
dataset['CallersT'] = dataset['CallersT'].astype('category').cat.codes
dataset['CallersN'] = dataset['CallersN'].astype('category').cat.codes
dataset['CallersU'] = dataset['CallersU'].astype('category').cat.codes
dataset['CallersCallersT'] = dataset['CallersCallersT'].astype('category').cat.codes
dataset['CallersCallersN'] = dataset['CallersCallersN'].astype('category').cat.codes
dataset['CallersCallersU'] = dataset['CallersCallersU'].astype('category').cat.codes
dataset['CalleesT'] = dataset['CalleesT'].astype('category').cat.codes
dataset['CalleesN'] = dataset['CalleesN'].astype('category').cat.codes
dataset['CalleesU'] = dataset['CalleesU'].astype('category').cat.codes
dataset['CalleesCalleesT'] = dataset['CalleesCalleesT'].astype('category').cat.codes
dataset['CalleesCalleesN'] = dataset['CalleesCalleesN'].astype('category').cat.codes
dataset['CalleesCalleesU'] = dataset['CalleesCalleesU'].astype('category').cat.codes
pd.set_option('display.max_columns', None)

print(dataset.head())
row_count, column_count = dataset.shape

X = dataset.iloc[:, 1:column_count].values
y = dataset.iloc[:, 0].values
Xcol = dataset.iloc[:, 1:column_count]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
我执行代码的最后一行( ValueError: could not convert string to float: 'Isolated')时收到错误: X_train = sc.fit_transform(X_train),尽管我使用的是代码行: dataset['MethodType'] = dataset['MethodType'].astype('category').cat.codesMethodType从字符串转换为float。我怎样才能解决这个问题?
这是错误的回溯:
Traceback (most recent call last):

File "<ipython-input-38-d7fe5c294c10>", line 1, in <module>
runfile('C:/Users/mouna/ownCloud/Mouna Hammoudi/dumps/Python/RandomForestSimplified.py', wdir='C:/Users/mouna/ownCloud/Mouna Hammoudi/dumps/Python')

File "C:\Users\mouna\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 668, in runfile
execfile(filename, namespace)

File "C:\Users\mouna\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 108, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/Users/mouna/ownCloud/Mouna Hammoudi/dumps/Python/RandomForestSimplified.py", line 43, in <module>
X_train = sc.fit_transform(X_train)

File "C:\Users\mouna\Anaconda3\lib\site-packages\sklearn\base.py", line 517, in fit_transform
return self.fit(X, **fit_params).transform(X)

File "C:\Users\mouna\Anaconda3\lib\site-packages\sklearn\preprocessing\data.py", line 590, in fit
return self.partial_fit(X, y)

File "C:\Users\mouna\Anaconda3\lib\site-packages\sklearn\preprocessing\data.py", line 612, in partial_fit
warn_on_dtype=True, estimator=self, dtype=FLOAT_DTYPES)

File "C:\Users\mouna\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 433, in check_array
array = np.array(array, dtype=dtype, order=order, copy=copy)

ValueError: could not convert string to float: 'Isolated'

最佳答案

确定当您查看代码的输出(print(dataset.head()))时,您会看到第一列“gold”,但这仍然是一个字符串。发生这种情况是因为pandas将第一列用作索引。

     gold  Program Requirement  MethodType  Top  Side  CallersT  CallersN  \
T 0 0 Inner 2 1 1 0 0
N 0 1 Inner 0 0 0 1 0
N 0 2 Inner 0 0 0 1 0
N 0 3 Root 0 0 0 1 0
N 0 4 Inner 0 0 0 1 0

CallersU CallersCallersT CallersCallersN CallersCallersU CalleesT \
T 1 0 0 1 0
N 0 1 0 0 1
N 0 1 0 0 1
N 0 1 0 0 1
N 0 1 0 0 1

CalleesN CalleesU CalleesCalleesT CalleesCalleesN CalleesCalleesU
T 0 0 0 1 -1
N 0 0 0 1 -1
N 0 0 0 1 -1
N 0 0 0 1 -1
N 0 0 0 1 -1
解:
dataset = pd.read_csv( 'dataExtended2.txt', sep= ',', index_col=False) 
然后输出将是:
  gold  Program  Requirement  MethodType  Top  Side  CallersT  CallersN  \
0 1 0 1 0 2 1 1 0
1 0 0 2 0 0 0 0 1
2 0 0 3 0 0 0 0 1
3 0 0 4 3 0 0 0 1
4 0 0 5 0 0 0 0 1

CallersU CallersCallersT CallersCallersN CallersCallersU CalleesT \
0 0 1 0 0 1
1 0 0 1 0 0
2 0 0 1 0 0
3 0 0 1 0 0
4 0 0 1 0 0

CalleesN CalleesU CalleesCalleesT CalleesCalleesN CalleesCalleesU
0 0 0 0 0 1
1 1 0 0 0 1
2 1 0 0 0 1
3 1 0 0 0 1
4 1 0 0 0 1
Pandas https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html的csv导入文档中的更多详细信息

关于python - Python : ValueError: could not convert string to float: 'Isolated' when reading input file for applying random forest,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62559766/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com