gpt4 book ai didi

python - 从多索引 DataFrame 中搜索和处理数据

转载 作者:行者123 更新时间:2023-12-04 15:23:48 24 4
gpt4 key购买 nike

我有两个数据框 df2,其中包含付款统计信息(客户支付一定债务的可能性)和 df3,其中包含新客户数据。

import pandas as pd

d = {'City': ['Tokyo','Tokyo','Lisbon','Tokyo','Tokyo','Lisbon','Lisbon','Lisbon','Tokyo','Lisbon','Tokyo','Tokyo','Tokyo','Lisbon','Tokyo','Tokyo','Lisbon','Lisbon','Lisbon','Tokyo','Lisbon','Tokyo'],
'Card': ['Visa','Visa','Master Card','Master Card','Visa','Master Card','Visa','Visa','Master Card','Visa','Master Card','Visa','Visa','Master Card','Master Card','Visa','Master Card','Visa','Visa','Master Card','Visa','Master Card'],
'Colateral':['Yes','No','Yes','No','No','No','No','Yes','Yes','No','Yes','Yes','No','Yes','No','No','No','Yes','Yes','No','No','No'],
'Client Number':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22],
'DebtPaid':[0.8,0.1,0.5,0.30,0,0.2,0.4,1,0.60,1,0.5,0.2,0,0.3,0,0,0.2,0,0.1,0.70,0.5,0.1]}

df = pd.DataFrame(data=d)

df2=df.groupby(['City','Card','Colateral'])['DebtPaid'].\
value_counts(bins=[-0.001,0,0.25,0.5,0.75,1,1.001,2],normalize=True)
d = {'City': ['Tokyo','Tokyo','Lisbon','Tokyo','Tokyo','Lisbon','Lisbon','Lisbon','Tokyo','Lisbon','Tokyo','Tokyo','Tokyo','Lisbon','Tokyo','Tokyo','Lisbon','Lisbon','Lisbon','Tokyo','Lisbon','Tokyo'], 
'Card': ['Visa','Visa','Master Card','Master Card','Visa','Master Card','Visa','Visa','Master Card','Visa','Master Card','Visa','Visa','Master Card','Master Card','Visa','Master Card','Visa','Visa','Master Card','Visa','Master Card'],
'Colateral':['Yes','No','Yes','No','No','No','No','Yes','Yes','No','Yes','Yes','No','Yes','No','No','No','Yes','Yes','No','No','No'],
'Client Number':[23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44],
'Total Debt':[100,240,200,1000,50,20,345,10,600,40,50,20,100,30,100,600,200,200,150,700,50,120]}

df3 = pd.DataFrame(data=d)

我想计算客户支付的预估金额。前任:如果客户来自里斯本,有 Visa 和抵押品,它有 0.333333 支付 0% 债务的变化,0.3333% 支付变化 ]0-25%] 债务和 0,3333% 支付 ]0,75-1] % 的债务。所以如果这个客户有 100 的债务,那么期望值将来自

[(0,33 * 0 * 100)+(0 * 0 * 100)+(0,33 * 0,75 * 100] 到 [(0,33 * 0 * 100 + 0,33 * 0, 25 * 100+0,33 * 1 * 100).

所以这个客户将支付 24.75 欧元到 41.25 欧元。

然后为所有其他客户计算一下。

关于如何解决这个问题的任何想法?

最佳答案

你可以这样做:

#First merge dataframes
df_out = df2.rename('Prob').reset_index().merge(df3, on=['City', 'Card', 'Colateral'])

#Use the right and left attributes of pd.Interval
df_out['lower'] = [x.left for x in df_out['DebtPaid']]
df_out['upper'] = [x.right for x in df_out['DebtPaid']]

#Calculate lower and upper partial prices
df_out['l_partial'] = df_out[['lower', 'Prob', 'Total Debt']].prod(axis=1)
df_out['u_partial'] = df_out[['upper', 'Prob', 'Total Debt']].prod(axis=1)

#Sum partial prices to get lower and upper price grouped on Client Number
df_out.groupby('Client Number')[['l_partial', 'u_partial']]\
.agg(lower_price=('l_partial', 'sum'),
upper_price=('u_partial', 'sum')).clip(0,np.inf)

输出:

              lower_price  upper_price
Client Number
23 37.500000 62.500000
24 0.000000 15.000000
25 50.000000 100.000000
26 187.000000 375.000000
27 0.000000 3.125000
28 0.000000 5.000000
29 143.750000 230.000000
30 2.493333 4.166667
31 225.000000 375.000000
32 16.666667 26.666667
33 18.750000 31.250000
34 7.500000 12.500000
35 0.000000 6.250000
36 7.500000 15.000000
37 18.700000 37.500000
38 0.000000 37.500000
39 0.000000 50.000000
40 49.866667 83.333333
41 37.400000 62.500000
42 130.900000 262.500000
43 20.833333 33.333333
44 22.440000 45.000000

关于python - 从多索引 DataFrame 中搜索和处理数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62721923/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com