gpt4 book ai didi

python - 根据决策树算法生成的模型进行预测

转载 作者:行者123 更新时间:2023-11-30 09:03:40 26 4
gpt4 key购买 nike

我一直在尝试做出一个预测,其中包含我使用决策树算法创建的模型中的 DataFrame。

我得到了我的模型的分数,为 0.96。然后,我尝试使用该模型对留下来但出现错误的 DataFrame 人员进行预测。目标是根据留在公司的 DataFrame 来预测 future 将离开公司的人员。

如何实现这一目标?

所以我所做的是:

  1. 从我的 github 中读取 DF,并将其分发给已离开和未离开的人
df = pd.read_csv('https://raw.githubusercontent.com/bhaskoro-muthohar/DataScienceLearning/master/HR_comma_sep.csv')

leftdf = df[df['left']==1]
notleftdf =df[df['left']==0]
  • 为模型生成准备数据
  • df.salary = df.salary.map({'low':0,'medium':1,'high':2})
    df.salary
    X = df.drop(['left','sales'],axis=1)
    y = df['left']
  • 分割训练集和测试集
  • import numpy as np
    from sklearn.model_selection import train_test_split


    #splitting the train and test sets
    X_train, X_test, y_train, y_test= train_test_split(X,y,random_state=0, stratify=y)
  • 训练它
  • from sklearn import tree
    clftree = tree.DecisionTreeClassifier(max_depth=3)
    clftree.fit(X_train,y_train)
  • 评估模型
  • y_pred = clftree.predict(X_test)
    print("Test set prediction:\n {}".format(y_pred))
    print("Test set score: {:.2f}".format(clftree.score(X_test, y_test)))

    结果是

    Test set score: 0.96

  • 然后我尝试使用尚未离开公司的人员的 DataFrame 进行预测
  • X_new = notleftdf.drop(['left','sales'],axis=1)

    #Map salary to 0,1,2
    X_new.salary = X_new.salary.map({'low':0,'medium':1,'high':2})
    X_new.salary
    prediction_will_left = clftree.predict(X_new)
    print("Prediction: {}".format(prediction_will_left))
    print("Predicted target name: {}".format(
    notleftdf['left'][prediction_will_left]
    ))

    我得到的错误是:

    KeyError: "None of [Int64Index([0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n            ...\n            0, 0, 0, 0, 0, 0, 1, 0, 0, 0],\n           dtype='int64', length=11428)] are in the [index]"

    如何解决?

    PS:完整脚本链接为 here

    最佳答案

    也许您正在寻找类似的东西。 (将 the data file 下载到同一目录后,就会生成独立的脚本。)

    from sklearn import tree
    from sklearn.model_selection import train_test_split
    import numpy as np
    import pandas as pd


    def process_df_for_ml(df):
    """
    Process a dataframe for model training/prediction use.

    Returns X/y tensors.
    """

    df = df.copy()
    # Map salary to 0,1,2
    df.salary = df.salary.map({"low": 0, "medium": 1, "high": 2})
    # dropping left and sales X for the df, y for the left
    X = df.drop(["left", "sales"], axis=1)
    y = df["left"]
    return (X, y)

    # Read and reindex CSV.
    df = pd.read_csv("HR_comma_sep.csv")
    df = df.reindex()

    # Train a decision tree.
    X, y = process_df_for_ml(df)
    X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, stratify=y)
    clftree = tree.DecisionTreeClassifier(max_depth=3)
    clftree.fit(X_train, y_train)

    # Test the decision tree on people who haven't left yet.
    notleftdf = df[df["left"] == 0].copy()
    X, y = process_df_for_ml(notleftdf)
    # Plug in a new column with ones and zeroes from the prediction.
    notleftdf["will_leave"] = clftree.predict(X)
    # Print those with the will-leave flag on.
    print(notleftdf[notleftdf["will_leave"] == 1])

    关于python - 根据决策树算法生成的模型进行预测,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58570355/

    26 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com