gpt4 book ai didi

python - 深度学习: Multiclass Classification with same amount of labels between the training dataset and test dataset

转载 作者:行者123 更新时间:2023-11-30 08:40:18 24 4
gpt4 key购买 nike

我正在编写用于进行多类分类的代码。我有 7 列(6 个特征和 1 个标签)的自定义数据集,训练数据集有 2 种类型的标签(1 和 2),测试数据集有 3 种类型的标签(1、2 和 3)。该模型的目的是查看模型预测标签“3”的效果如何。目前,我正在尝试MLP算法,代码如下:

import tensorflow as tf
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers.embeddings import Embedding
from keras import optimizers
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import MinMaxScaler
import pandas as pd
import numpy as np
from sklearn.metrics import confusion_matrix
from sklearn.utils.multiclass import unique_labels
from keras.models import load_model
from sklearn.externals import joblib
from joblib import dump, load
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
#from keras.layers import Dense, Embedding, LSTM, GRU
#from keras.layers.embeddings import Embedding


#Load the test dataset
df1 = pd.read_csv("/home/user/Desktop/FinalTestSet.csv")
test = df1

le = LabelEncoder()

test['Average_packets_per_flow'] = le.fit_transform(test['Average_packets_per_flow'])
test['Average_PktSize_per_flow'] = le.fit_transform(test['Average_PktSize_per_flow'])
test['Avg_pkts_per_sec'] = le.fit_transform(test['Avg_pkts_per_sec'])
test['Avg_bytes_per_sec'] = le.fit_transform(test['Avg_bytes_per_sec'])
test['N_pkts_per_flow'] = le.fit_transform(test['N_pkts_per_flow'])
test['N_pkts_size_per_flow'] = le.fit_transform(test['N_pkts_size_per_flow'])

#Select the x and y columns from dataset
xtest_Val = test.iloc[:,0:6].values
Ytest = test.iloc[:,6].values
#print Ytest

#MinMax Scaler
scaler = MinMaxScaler(feature_range=(-1, 1))
Xtest = scaler.fit_transform(xtest_Val)

#print Xtest

#Load the train dataset
df2 = pd.read_csv("/home/user/Desktop/FinalTrainingSet.csv")
train = df2

le = LabelEncoder()

test['Average_packets_per_flow'] = le.fit_transform(test['Average_packets_per_flow'])
test['Average_PktSize_per_flow'] = le.fit_transform(test['Average_PktSize_per_flow'])
test['Avg_pkts_per_sec'] = le.fit_transform(test['Avg_pkts_per_sec'])
test['Avg_bytes_per_sec'] = le.fit_transform(test['Avg_bytes_per_sec'])
test['N_pkts_per_flow'] = le.fit_transform(test['N_pkts_per_flow'])
test['N_pkts_size_per_flow'] = le.fit_transform(test['N_pkts_size_per_flow'])

#Select the x and y columns from dataset
xtrain_Val = train.iloc[:,0:6].values
Ytrain = train.iloc[:,6].values
#print Ytrain

#MinMax Scaler
scaler = MinMaxScaler(feature_range=(-1, 1))

# Fit the model
Xtrain = scaler.fit_transform(xtrain_Val)


#Reshape data for CNN
Xtrain = Xtrain.reshape((Xtrain.shape[0], 1, 6, 1))
print(Xtrain)
#Xtest = Xtest.reshape((Xtest.shape[0], 1, 6, 1))
#print Xtrain.shape

max_length=70
EMBEDDING_DIM=100
vocab_size=100
num_labels=2

#Define model
def init_model():
model = Sequential()
model.add(Dense(64, activation='relu', input_dim=Xtrain.shape[0]))
model.add(Flatten())
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Flatten())
model.add(Dropout(0.5))
model.add(Dense(64, activation='softmax'))
model.add(Flatten())

#adam optimizer
adam = optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)

model.compile(optimizer = adam, loss='categorical_crossentropy', metrics=['accuracy'])
return model

print('Train...')
model = init_model()

#To avoid overfitting
callbacks = [EarlyStopping('val_loss', patience=3)]
hist = model.fit(Xtrain, Ytrain, epochs=50, batch_size=50, validation_split=0.20, callbacks=callbacks, verbose=1)

#Evaluate model and print results
score, acc = model.evaluate(Xtest, Ytest, batch_size=50)
print('Test score:', score)
print('Test accuracy:', acc)

但是,我收到以下错误:

ValueError: Input 0 is incompatible with layer flatten_1: expected min_ndim=3, found ndim=2

我尝试删除展平层,但收到不同的错误:

ValueError: Error when checking input: expected dense_1_input to have shape (424686,) but got array with shape (6,)

424686 是数据集中的行数,6 是特征数。

我很感激任何建议。谢谢。

根据 Omarfoq 的建议,现在我对训练和测试数据集使用了三个标签。代码和错误保持不变。

有人可以建议我解决方案吗?谢谢。

最佳答案

我想说你所尝试的不合逻辑,如果训练集中不存在你的模型将永远不会预测类“3”。你正在尝试的事情没有任何意义。尝试重新表述您的问题。

关于python - 深度学习: Multiclass Classification with same amount of labels between the training dataset and test dataset,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59719766/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com