gpt4 book ai didi

Python/ Pandas : How to consolidate repeated rows with NaN in different columns?

转载 作者:行者123 更新时间:2023-12-01 03:12:27 26 4
gpt4 key购买 nike

必须有更好的方法来做到这一点,请帮助我

这是我必须清理的一些数据的摘录,其中有几种“重复”行(并非所有行都是重复的):

df=

LoanID | CustomerID | LoanStatus | CreditScore | AnnualIncome | ...
-------+------------+------------+-------------+--------------+-----
100 | ABC | Paid | NaN | 34200 |
100 | ABC | Paid | 724 | 34200 |
200 | DEF | Write Off | 611 | 9800 |
200 | DEF | Write Off | 611 | NaN |
300 | GHI | Paid | NaN | 247112 |
300 | GHI | Paid | 799 | NaN |
400 | JKL | Paid | NaN | NaN |
500 | MNO | Paid | 444 | NaN |

所以我有以下类型的重复案例:

  1. CreditScore 列中的 NaN 和有效值 (LoanID = 100)
  2. AnnualIncome 列中存在 NaN 和有效值 (LoanID = 200)
  3. CreditScore 列中的 NaN 和有效值 AND AnnualIncome 列中的 NaN 和有效值(贷款 ID=300)
  4. LoanID 400 和 500 是“正常”情况

所以,显然我想要的是一个没有重复项的数据框,例如:

LoanID | CustomerID | LoanStatus | CreditScore | AnnualIncome | ...
-------+------------+------------+-------------+--------------+-----
100 | ABC | Paid | 724 | 34200 |
200 | DEF | Write Off | 611 | 9800 |
300 | GHI | Paid | 799 | 247112 |
400 | JKL | Paid | NaN | NaN |
500 | MNO | Paid | 444 | NaN |

那么,我是如何解决这个问题的:

# Get the repeated keys:
rep = df['LoanID'].value_counts()
rep = rep[rep > 2]

# Now we get the valid number (we overwrite the NaNs)
for i in rep.keys():
df.loc[df['LoanID'] == i, 'CreditScore'] = df[df['LoanID'] == i]['CreditScore'].max()
df.loc[df['LoanID'] == i, 'AnnualIncome'] = df[df['LoanID'] == i]['AnnualIncome'].max()

# Drop duplicates
df.drop_duplicates(inplace=True)

这有效,完全符合我的需要,问题是这个数据帧有几个 100k 记录,所以这个方法需要“永远”,必须有一些方法可以做得更好,对吧?

最佳答案

按贷款 ID 分组、填写上方和下方的缺失值以及删除重复项似乎很有效:

df.groupby('LoanID').apply(lambda x: \
fillna(method='ffill').\
fillna(method='bfill').\
drop_duplicates()).\
reset_index(drop=True).\
set_index('LoanID')
# CustomerID LoanStatus CreditScore AnnualIncome
#LoanID
#100 ABC Paid 724.0 34200.0
#200 DEF Write Off 611.0 9800.0
#300 GHI Paid 799.0 247112.0
#400 JKL Paid NaN NaN
#500 MNO Paid 444.0 NaN

关于Python/ Pandas : How to consolidate repeated rows with NaN in different columns?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42744408/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com