python - 无法弄清楚如何定义我的 y

python - 无法弄清楚如何定义我的 y_test

转载作者：行者123 更新时间：2023-11-30 09:14:22

26

4

我对 Python 和 sklearn 非常陌生，任何帮助将不胜感激。我之前唯一的经验是使用 mnist，我不确定如何在使用 csv 时定义 y_test。

我已经尝试过其他一些迭代，但到目前为止没有任何效果。我没有包括进口和公用事业。

dataDir = '/content/drive/My Drive/Colab Notebooks/Final/dataQ2/' # Directory with input files
trainFile = 'q2train.csv' # Training examples
labelFile = 'q2label.csv' # Test label
validFile = 'q2valid.csv' # Valid Files

train = pd.read_csv(dataDir+trainFile) # Read training data
valid = pd.read_csv(dataDir+validFile) # Read test data
label = pd.read_csv(dataDir+labelFile) # Unlabeled file data

x_train = train[list(train)[1:]].values
x_test = valid[list(train)[1:]].values

# Specify output directories
modelDir = 'model/' # directory for saved models
outputDir = 'output/' #directory for output files

# Create Directories if needed:
os.makedirs(os.path.dirname(modelDir), exist_ok=True)
os.makedirs(os.path.dirname(outputDir), exist_ok=True)

#Display directory names
print('Models saved in %s' %modelDir)
print('Outputs saved in %s' %outputDir)

models = {} #dictoionary of SciKit-Learn classifiers with non-default parameters
models['NB'] = MultinomialNB()
models['DT'] = DecisionTreeClassifier()
models['RF'] = RandomForestClassifier(n_estimators=100)
models['KNN'] = KNeighborsClassifier(n_neighbors=10, algorithm='brute')
models['SVM'] = SVC(kernel='poly', gamma='auto')
models['LRM'] = LogisticRegression()

#Define function to evaluate classification accuracy
def evaluatePredictions(modelName, actual, predicted):
  """Returns classification accuracy
  -Saves confusion matrix in outputDir
  -Displays classification report
  -Saves predicted classes in pandas data frame 'predictedDF'"""
  acc = accuracy_score(actual, predicted) # accuracy
  print("Accuracy with test data: %4.2f%%\n" %(100*acc))
  print("CONFUSION MATRIX (Rows correspond to True Values):\n")
  cm = confusion_matrix(actual, predicted) #confusion_matrix
  cm = pd.DataFrame(cm) #convert to pandas data frame
  print(cm) # print confusion matrix
  cm.to_csv(outputDir+modelName+'confusionMatrix.csv') # save confusion matrix
  print("\nCLASSIFICATION REPORT:\n")
  print(classification_report(actual, predicted)) #classification report
  return acc #returns accuracy

def displayDigits(images, labels, nCols=10):
  """Displays images with labels (nCols per row)
  -images: list of vectors with 784 (28/28) grayscale values
  -labels: list of labels for images"""
  nRows = np.ceil(len(labels)/nCols).astype('int') # number of rows
  plt.figure(figsize=(2*nCols,2*nRows)) #figure size
  for i in range(len(labels)):
    plt.subplot(nRows,nCols,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(images[i].reshape(28, 28), interpolation='nearest')
    plt.xlabel(str(labels[i]), fontsize=14)
  plt.show()
  return


def get_data(trainFile, test_prop=0.2, seed=2019): #I am pretty sure this is line is my issue.
  """returns data for training, testing, and data characteristics"""
  data = data_sets[data_set_name]
  X, y = data.data, data.target
  X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                      test_size=test_prop, 
                                                      random_state=seed)
  nF = X.shape[1] # number of features
  nC = len(np.unique(y)) # number of classes
  nTrain, nTest = len(y_train), len(y_test)
  print("\nData set: %s" %data_set_name)
  print("\tNumber of features %d" %nF)
  print("\tNumber of output classes = %d" %(nC))
  print("\tNumber of training examples = %d" %(nTrain))
  print("\tNumber of testing examples = %d" %(nTest))
  return X_train, X_test, y_train, y_test, nF, nC, nTrain, nTest

#Train and test Scikit-Learn models
result = [] #stores accuracy and time for training models
predictedTest = pd.DataFrame()
predictedTest['label'] = y_test

for m in models_used:
  model = models[m]
  print("Training classifier:\n%s\n" %model)

  #train model
  st = time.time()
  model.fit(x_train, y_train)
  tTrain = time.time() - st
  print("Time to train classifier: %4.2f seconds\n" %(tTrain))

  #predict test examples with trained model
  st = time.time() # start time for prediction
  predicted = model.predict(x_test) #predict test labels with trained model
  tTest = time.time() - st #time to predict test examples
  print("Time to test classifier: %4.2f seconds\n" %(tTest))

  #Save trained model
  modelFile = modelDir + m + '.sav' #name for saved Scikit-Learn model file
  pickle.dump(model, open(modelFile, 'wb')) #save model
  print('Trained model saved as %s\n' %modelFile)

  #evaluate prdeiction accuracy on test examples
  acc = evaluatePredictions(m, y_test, predicted) # evaluate prediction accuracy

  result.append([m, acc, tTrain, tTest]) #record results
  predictedTest[m] = predicted #save predicted class
  print(60*'='+'\n') #end training and testing for model

提前谢谢您。

最佳答案

我不确定我是否理解正确:当你从预测中得到答案时有效您可以将其传输到 pandadata 框架，然后将其转换为 csv 文件，如下所示:

y_csv={'answer':predict}
y_csv=pd.DataFrame(data=y_csv)
y_csv.to_csv('Filename',index=False)

关于python - 无法弄清楚如何定义我的 y_test，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59132167/

26

4

0

文章推荐： java - Spring REST 模板发布

文章推荐： python - <训练样本> 和 <验证样本> 是什么意思？

文章推荐： python - 选择 ML 的最佳功能

文章推荐： python-3.x - 使用 Spacy Textcat 自定义损失函数

html - 弄清楚 % 混淆的网格布局
我一直在试图理解人们一直在使用的这个网格系统。有时让我觉得自己很蠢。我了解如果您使用无边距的 12 网格系统。第 12 列将是 100%，而第 1 列将约为 8.33333%。我一直在看一些网格系
c - 弄清楚 Unix 中的写入
我们被分配了一个用于系统编程的 ASCII 压缩项目，但我在代码中的某一特定行上遇到了困难。我问了question关于压缩，在处理完纸上示例文件的前十几个字母后，我将数组代码调整到了我的程序中。在
ios - 弄清楚 iOS 模拟器崩溃报告
我正在使用 Appcelerator 框架编写应用程序，但偶尔会发生崩溃。我正在尝试找出导致崩溃的原因，因此我决定查看 iOS 模拟器崩溃报告。当然，这对我来说都是希腊语，但我希望得到一些指导，了解其
c++ - 弄清楚 C++ 集
有人可以给我一些指导或指导我阅读有关 C++ set 对象的优秀教程吗？我有一段这样的简单代码: #include using namespace std; int main() { ch
swift - 弄清楚 Swift 中何时发生上下文切换
老实说，我不知道我的问题是否有解决方案，但我想在 Swift 中捕捉上下文切换发生的时间。我正在想象一个需要很长时间才能完成的功能，例如远程服务器上的写操作，我在想是否有办法了解何时(至少在哪一行)
php - 弄清楚 Yii2 在使用回退时如何找到主题资源
我正在使用 Yii2 并且一直在阅读 theming和 theme inheritance ;但是有一些问题: 考虑以下示例: 'view' => [ 'theme' => [
javascript - 弄清楚 AJAX POST 函数时遇到问题
我尝试使用 AJAX 发布，因为我不想使用提交按钮并在每次单击它时重新加载页面。我正在使用此代码进行 ajax: Ajax loading error, please try again.").sho
node.js - 弄清楚 MongoDB 数据库模型和引用
我正在尝试找出将在 NodeJS 应用程序中使用的 MongoDB 模型的理想设计。该应用程序的设置类似于调查，某些步骤会根据之前的选择提供选项。这是选择和可能性的示例。第 1 级:图案类型:纯色、
node.js - 弄清楚 promise 遭到拒绝的正确方法是什么？
我有一个 API/Express 路由器: router.post("/signup", async function (req, res) { try { var user
java - 通过 Java 弄清楚 Windows 中的系统文件是什么
我注意到 JFileChooser 隐藏了 Windows 系统文件。 hiberfil.sys、pagefile.sys、$Recycle.Bin 等文件、一些无法打开的快捷方式文件夹等... 我可
python - 弄清楚 Django ORM 的问题，查询内部联接的正确方法是什么？
这是我第一次使用 Django，到目前为止，我对这个框架的工作方式印象深刻。我目前正在开发我的第一个应用程序，并正在处理数据库内容，但是，我在弄清楚如何在不运行原始查询的情况下进行内部联接时遇到问题。
objective-c - 弄清楚 TwUI 中的自动调整大小掩码 - 底部边距？
我在自动调整蒙版大小方面遇到了一些问题。这是交易:我正在使用最近发布的 TwUI ，它从 UIKit 中获取了很多，但它在 Mac 上。这就是我为 iOS 和 Mac 标记的原因。因此，我创建了一个底
c - 弄清楚 startpar.c (sysvinit) 在做什么
好吧，这是一个很长的，打起精神来! :) 最近我尝试在启动期间启动一个用 bash 编写的看门狗脚本。所以我在 rc.local 中添加了一行，其中包含以下内容: su someuser -c "/h
visual-studio - 弄清楚 Visual Studio 使用的 Windows SDK
我在我的机器上安装了多个版本的 Windows 软件开发工具包，有趣的是，我的机器上已经安装了一个 Visual studio Installer工具的版本低于近一年前安装的版本: Windows S
html - 使用 Yii 1 我想给我的主菜单一个 css 样式但是不能用 Bootstrap 弄清楚
widget('zii.widgets.CMenu', array( 'items'=>array( array('label'=>'Home', '

首页

博学

6Ren·AI

商城

python - 无法弄清楚如何定义我的 y_test