gpt4 book ai didi

python - 附加不存在的行数据时,来自 pd.Crosstab 的 Multiindex 的问题

转载 作者:行者123 更新时间:2023-12-04 08:28:31 24 4
gpt4 key购买 nike

大家早,
星期五快乐。我有一些 excel 输出显示 Client , 他们的 Sector以及他们的 Result 的计数通过交叉表命令。这很有效:

dfAll_Clients = {'All_Client': ['AAA','BBB','CCC','DDD','EEE','FFF'],
'City': ['SY','LN','NY','TO','TK','LA']}
dfAll_Clients = pd.DataFrame.from_dict(dfAll_Clients)
df = { 'Client': ['AAA', 'AAA', 'AAA',
'BBB', 'BBB', 'BBB', 'BBB','BBB','BBB','BBB',
'CCC',
'DDD','DDD','DDD','DDD','DDD','DDD','DDD','DDD','DDD','DDD'],
'Sector': ['GOV', 'GOV', 'CORP',
'GOV', 'GOV', 'GOV', 'GOV','CORP','CORP','CORP',
'GOV',
'GOV','GOV','GOV','GOV','GOV','GOV','GOV','GOV','GOV','CORP'],
'Result': ['Covered', 'Customer Reject', 'Customer Timeout',
'Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','No RFQ',
'No RFQ',
'Covered','Customer Reject','Customer Timeout','Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','No RFQ']
}
df = pd.DataFrame.from_dict(df)
# print(df)

vals = ['Covered',
'Customer Reject',
'Customer Timeout',
'Dealer Reject',
'Dealer Timeout',
'Done',
'No RFQ',
'Tied Covered',
'Tied Done',
'Tied Traded Away',
'Traded Away']

df = (pd.crosstab([df.Client,
df.Sector],
df.Result,
margins=True,
margins_name='Total_Result_Per_Client')
.drop('Total_Result_Per_Client')
.reindex(vals + ['Total_Result_Per_Client'], axis=1, fill_value=0))
# Total Priced Back = (All RFQ's - Dealer Reject - Dealer_Timeout) / All RFQ's
df['Total_Priced_Back'] = (df['Total_Result_Per_Client']- df['Dealer Reject'] - df['Dealer Timeout']) / (df['Total_Result_Per_Client'])
# Hit_Rate = (Done + Tied Done) / Total RFQ's less Customer Reject and Customer Timeout
df['Hit_Rate'] = (df['Done'] + df['Tied Done']) / (df['Total_Result_Per_Client']- df['Customer Reject'] - df['Customer Timeout'])
# Populate any nulls due to 0/0
df = df.fillna(0)
# Format Pct cols
decimals2 = 2
df['Total_Priced_Back'] = df['Total_Priced_Back'].apply(lambda x: round(x * 100, decimals2)).astype(str) + '%'
df['Hit_Rate'] = df['Hit_Rate'].apply(lambda x: round(x * 100, decimals2)).astype(str) + '%'
print (df)
df.to_excel('C:\Temp\Out_Data_EOM_Key_Clients_Corp.xlsx')
excel提取符合要求。
enter image description here
另一个请求是添加所有其他可能的客户,这些客户不存在于当前月份数据中,但可能存在于 future 月份中。
每个 Client在没有数据的情况下在交叉表中添加一行,并为每个字段插入; #NA .最终输出将是:
enter image description here
我希望通过以下方式添加这些行:
# Get list of all possible clients
dfAll_Clients = pd.DataFrame.from_dict(dfAll_Clients.All_Client)
new_index = tuple(list(dfAll_Clients.All_Client))
print(new_index)
# Append clients not present in current row entries
dfTemp = df.reindex(new_index, fill_value=0)
print(dfTemp)
问题是交叉表的结果是多索引的。我尝试使用 df = df.stack([0]).reset_index() 来展平交叉表输出但这完全改变了结构并且完全偏离了最终输出。我现在得到 TypeError: Expected tuple, got str任何帮助,将不胜感激。

最佳答案

你可以试试reindex

#here E and F (l) you can get it by 

cond = dfAll_Clients.All_Client.isin(df.index.get_level_values(0))
l = dfAll_Clients.loc[~cond,'All_Client'].unique().tolist()
l = [(x, None)for x in l]
df = df.reindex(pd.MultiIndex.from_tuples(df.index.tolist()+l))

关于python - 附加不存在的行数据时,来自 pd.Crosstab 的 Multiindex 的问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65136088/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com