gpt4 book ai didi

python - Pandas 像嵌套的 countifs 一样出色

转载 作者:太空宇宙 更新时间:2023-11-04 04:30:29 24 4
gpt4 key购买 nike

我在这里遇到了障碍。我必须翻译这个excel公式

=IF(COUNTIFS(advisor!$C:$C,$A3)=0,"0 disclosed",
IF(COUNTIFS(advisor!$C:$C,$A3,advisor!$E:$E,2)>0,"Dependent",
IF(IF(COUNTIFS(advisor!$C:$C,$A3,advisor!$B:$B,"auditor")>0,1,0)+IF(COUNTIFS(advisor!$C:$C,$A3,advisor!$B:$B,"compensation")>0,1,0)=2,"Independent","1 disclosed")))

到目前为止,这是我的 python-pandas 解决方案:

df['auditor_compensation'] = np.where(df['id'].isin(df_advisor['company_id']).count() == 0,
'0 disclosed',
np.where(df_advisor['dependent'] == 2, 'dependent',
np.where((np.where(df_advisor['type']=='auditor', 1, 0)+np.where(df_advisor['type']=='compensation', 1, 0)) == 2, 'independent', '1 disclosed')))

我不断收到 ValueError: Length of values does not match length of index

df样本数据:公司数据

id      ticker       iq_id     company              auditor_compensation
48299 ENXTAM:AALB IQ881736 Aalberts Industries ?
48752 ENXTAM:ABN IQ1090191 ABN AMRO Group ?
48865 ENXTAM:ACCEL IQ4492981 Accell Group ?
49226 ENXTAM:AGN IQ247906 AEGON ?
49503 ENXTAM:AD IQ373545 Koninklijke ?

下面是df_advisor样本数据

id    type          company_id  advisor_company_id  dependent
1 auditor 4829 6091 1
17 auditor 4875 16512 1
6359 auditor 4886 7360 1
37 auditor 4922 8187 1
4415 compensation 4922 9025 1
53 auditor 4950 8187 1

非常感谢任何帮助。

最佳答案

您的 numpy.where 函数不会生成与原始数据帧长度相同的数组或序列。这是因为它试图结合不一致的条件,例如df['id']df_advisor['dependent'] 将具有不同的长度。

虽然使用 GroupBy 将 Excel 公式转换为 Pandas/NumPy 很诱人, pd.DataFrame.mergenp.select可能会更有效率和可读性。

第 1 步:分组映射数据框

df_advisor_grouped = df_advisor.groupby('company_id')\
.agg({'type': '|'.join, 'dependent': 'sum'})\
.reset_index()

print(df_advisor_grouped)

company_id type dependent
0 4829 auditor 1
1 4875 auditor 1
2 4886 auditor 1
3 4922 auditor|compensation 2
4 4950 auditor 1

第 2 步:与主数据框合并

# merge dataframes based on key column
res = df.merge(df_advisor_grouped, left_on='id', right_on='company_id', how='left')

第 3 步:应用条件逻辑

# define 3 conditions
conds = [res['company_id'].isnull(), res['dependent'].eq(2),
res['type'].str.contains('auditor') & res['type'].str.contains('compensation')]

# define 3 choices
choices = ['0 disclosed', 'dependent', 'independent']

# apply np.select logic, including default argument if 3 conditions are not met
res['auditor_compensation'] = np.select(conds, choices, '1 disclosed')

关于python - Pandas 像嵌套的 countifs 一样出色,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52722794/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com