gpt4 book ai didi

python - 完成测试后使用分类模型预测重新填充数据框的替代方法?

转载 作者:太空宇宙 更新时间:2023-11-04 09:44:38 25 4
gpt4 key购买 nike

一旦您对分类模型的结果感到满意,您可以推荐什么替代方案来将预测值映射回其文本形式?使用 scikit 创建分类模型。

我一直在做的只是反转字典然后重新映射,见下文。

d={'Reported Harms':['Fell','Constipation','Surgical Delay'],'Complaint Description':['Patient Fell on face','Could not use bathroom','Medical rep was late']}
df=pd.DataFrame(data=d)

harms=df["Reported Harms"].unique()
harms_dict={value:index for index, value in enumerate(harms)}
results=df["Reported Harms"].map(harms_dict)

df['prediction']=[0,1,2]

inv_map={v:k for k, v in harms_dict.items()}
df["prediction"]=df["prediction"].map(inv_map)

谢谢

由于有人要求看模型,

import matplotlib.pyplot as plt
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
import seaborn as sns

from sklearn.feature_extraction.text import CountVectorizer
vect=CountVectorizer(min_df=1)

df=pd.read_excel('Test_data.xlsx',sheet_name='Test')
dff=pd.read_excel('Data_input.xlsx',sheet_name='Complaints')

corpus=df["Complaint Description"]
vectorizer=CountVectorizer(min_df=1)
X=vectorizer.fit_transform(corpus).toarray()
print(X.shape)

harms=df["Reported Harms"].unique()
harms_dict={value:index for index, value in enumerate(harms)}
results=df["Reported Harms"].map(harms_dict)

x_train,x_test,y_train,y_test=train_test_split(X,results,test_size=1,random_state=1,)

clf=MultinomialNB()
clf.fit(x_train,y_train)
clf.score(x_test,y_test)


vec_text=vectorizer.transform(dff["Complaint Description"]).toarray()
ids=dff["Complaint Description"]
dff['prediction']=clf.predict(vec_text)

inv_map={v:k for k, v in harms_dict.items()}
dff["prediction"]=dff["prediction"].map(inv_map)
s=dff['prediction'].value_counts()
sns.barplot(x=s.index,y=s.values)

writer = pd.ExcelWriter('Legacy_list.xlsx')
dff.to_excel(writer, 'Complaints edit',index=False)
writer.save()

最佳答案

一种常见的方法是使用 sklearn.preprocessing.LabelEncoder :

In [15]: from sklearn.preprocessing import LabelEncoder

In [18]: le = LabelEncoder()

In [19]: df['harms'] = le.fit_transform(df['Reported Harms'])

In [20]: df
Out[20]:
Complaint Description Reported Harms harms
0 Patient Fell on face Fell 1
1 Could not use bathroom Constipation 0
2 Medical rep was late Surgical Delay 2

In [21]: df['decoded'] = le.inverse_transform(df['harms'])
C:\Users\Max\Anaconda3_5.0\envs\py36\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. R
eturning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
if diff:

In [22]: df
Out[22]:
Complaint Description Reported Harms harms decoded
0 Patient Fell on face Fell 1 Fell
1 Could not use bathroom Constipation 0 Constipation
2 Medical rep was late Surgical Delay 2 Surgical Delay

关于python - 完成测试后使用分类模型预测重新填充数据框的替代方法?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50282280/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com