gpt4 book ai didi

python - 如何获取每行的百分比并可视化分类数据

转载 作者:太空宇宙 更新时间:2023-11-03 21:38:11 24 4
gpt4 key购买 nike

我正在对贷款预测数据集(Pandas 数据框)进行探索性数据分析。该数据框有两列:Property_Area,其值分为三种类型 - 农村、城市、半城市。另一列是 Loan_Status 明智的值有两种类型 - Y、N。我想绘制这样的图表:沿着 X 轴应该有 Property_Area,并且对于 3 个区域的每种类型,我想显示接受的贷款的百分比或沿 Y 轴拒绝。如何做到这一点?

这是我的数据示例:

data = pd.DataFrame({'Loan_Status':['N','Y','Y','Y','Y','N','N','Y','N','Y','N'], 
'Property_Area': ['Rural', 'Urban','Urban','Urban','Urban','Urban',
'Semiurban','Urban','Semiurban','Rural','Semiurban']})

我尝试过这个:

status = data['Loan_Status']
index = data['Property_Area']
df = pd.DataFrame({'Loan Status' : status}, index=index)
ax = df.plot.bar(rot=0)

data is the dataframe for the original dataset

输出: enter image description here

编辑:我能够做我想做的事,但是为此,我必须编写很长的代码:

new_data = data[['Property_Area', 'Loan_Status']].copy()
count_rural_y = new_data[(new_data.Property_Area == 'Rural') & (data.Loan_Status == 'Y') ].count()
count_rural = new_data[(new_data.Property_Area == 'Rural')].count()
#print(count_rural[0])
#print(count_rural_y[0])
rural_y_percent = (count_rural_y[0]/count_rural[0])*100
#print(rural_y_percent)

#print("-"*50)

count_urban_y = new_data[(new_data.Property_Area == 'Urban') & (data.Loan_Status == 'Y') ].count()
count_urban = new_data[(new_data.Property_Area == 'Urban')].count()
#print(count_urban[0])
#print(count_urban_y[0])
urban_y_percent = (count_urban_y[0]/count_urban[0])*100
#print(urban_y_percent)

#print("-"*50)

count_semiurban_y = new_data[(new_data.Property_Area == 'Semiurban') & (data.Loan_Status == 'Y') ].count()
count_semiurban = new_data[(new_data.Property_Area == 'Semiurban')].count()
#print(count_semiurban[0])
#print(count_semiurban_y[0])
semiurban_y_percent = (count_semiurban_y[0]/count_semiurban[0])*100
#print(semiurban_y_percent)

#print("-"*50)

objects = ('Rural', 'Urban', 'Semiurban')
y_pos = np.arange(len(objects))
performance = [rural_y_percent,urban_y_percent,semiurban_y_percent]
plt.bar(y_pos, performance, align='center', alpha=0.5)
plt.xticks(y_pos, objects)
plt.ylabel('Loan Approval Percentage')
plt.title('Area Wise Loan Approval Percentage')

plt.show()

输出:

enter image description here

如果可能的话,您能给我建议一种更简单的方法吗?

最佳答案

Pandas Crosstabsnormalize 将使这变得简单

在 pandas 数据框中获取 2+ 列并获取每行百分比的简单方法是使用 pandas crosstab 函数标准化='索引'

<小时/>

以下是交叉表函数查找它的方式:

# Crosstab with "normalize = 'index'". 
df_percent = pd.crosstab(data.Property_Area,data.Loan_Status,
normalize = 'index').rename_axis(None)

# Multiply all percentages by 100 for graphing.
df_percent *= 100

这将输出 df_percent,如下所示:

Loan_Status          N          Y
Rural 50.000000 50.000000
Semiurban 66.666667 33.333333
Urban 16.666667 83.333333

然后您可以轻松地将其绘制到条形图中:

# Plot only approvals as bar graph. 
plt.bar(df_percent.index, df_percent.Y, align='center', alpha=0.5)
plt.ylabel('Loan Approval Percentage')
plt.title('Area Wise Loan Approval Percentage')

plt.show()

并获取结果图表:

Matplotlib bar plot from pandas crosstab

<强> Here you can see the code working in google colab

<小时/>

这是我为此答案生成的示例数据框:

data = pd.DataFrame({'Loan_Status':['N','Y','Y','Y','Y','N','N','Y','N','Y','Y'
], 'Property_Area': ['Rural', 'Urban','Urban','Urban','Urban','Urban',
'Semiurban','Urban','Semiurban','Rural','Semiurban']})

创建此示例数据框:

   Loan_Status Property_Area
0 N Rural
1 Y Urban
2 Y Urban
3 Y Urban
4 Y Urban
5 N Urban
6 N Semiurban
7 Y Urban
8 N Semiurban
9 Y Rural
10 Y Semiurban

关于python - 如何获取每行的百分比并可视化分类数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53108063/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com