gpt4 book ai didi

python - Python 中的多输出多类机器学习

转载 作者:行者123 更新时间:2023-11-30 08:49:31 26 4
gpt4 key购买 nike

我一直在研究并努力寻找解决这个问题的最佳方法。我有一个训练数据集和一个测试数据集。测试数据集缺少训练数据集具有的两个特征列( channel 和扇区 - 均由 4 个类组成)。

我已经在数据上构建了决策树,但是当我需要能够在 channel 或扇区上进行训练时,我只能用它来训练。

任何人都可以给我一个在 python 中实现多类多输出机器学习的建议吗?

import os
import subprocess

import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeClassifier, export_graphviz

def getPath(thisFile):
if os.path.exists(thisFile):
df = pd.read_csv(thisFile, header=0)
else:
return
return df

def visualize_tree(tree, feature_names):

with open("dt.dot", 'w') as f:
export_graphviz(tree, out_file=f,
feature_names=feature_names)

command = ["dot", "-Tpng", "dt.dot", "-o", "dt.png"]
try:
subprocess.check_call(command)
except:
exit("Could not run dot, ie graphviz, to "
"produce visualization")

data = np.loadtxt("newTrain2.csv", delimiter=',')
X = data[:, 1:4]
quantity = data[:, 2]
for i in range(len(quantity)):
if quantity[i] < 30:
quantity[i] = 1
if quantity[i] >= 25 and quantity[i] < 75:
quantity[i] = 2
if quantity[i] >= 75 and quantity[i] < 250:
quantity[i] = 3
if quantity[i] > 250:
quantity[i] = 4
revenue = data[:, 3]
for i in range(len(revenue)):
if revenue[i] < 1000:
revenue[i] = 1
if revenue[i] >= 1000 and revenue[i] < 4000:
revenue[i] = 2
if revenue[i] >= 4000 and revenue[i] < 10000:
revenue[i] = 3
if revenue[i] > 10000:
revenue[i] = 4
X[:, 1] = quantity
X[:, 2] = revenue



targets = data[:,4]

thisTree = DecisionTreeClassifier(min_samples_split=30, random_state=99)
thisTree.fit(X, targets)
visualize_tree(thisTree, ["product", "quantity", "revenue"])

最佳答案

一种方法是将两个缺失的列转换为一个,将两个类组合起来。如果每列有 4 个不同的类,则合并后的列将有 4 * 4 = 16 个类。

关于python - Python 中的多输出多类机器学习,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42955421/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com