gpt4 book ai didi

python - 根据条件在 DataFrame 中添加新列

转载 作者:行者123 更新时间:2023-12-01 01:43:17 32 4
gpt4 key购买 nike

我有一个像这样的数据框:

+------------+---------------+-------------+---------------------+-------------------+
| SK_ID_CURR | CREDIT_ACTIVE | DAYS_CREDIT | DAYS_CREDIT_ENDDATE | DAYS_ENDDATE_FACT |
+------------+---------------+-------------+---------------------+-------------------+
| 436084 | Sold | -2835 | -2094.0 | -2436.0 |
| 436084 | Active | -987 | -438.0 | NaN |
| 436084 | Sold | -1875 | -1494.0 | -1494.0 |
| 436084 | Active | -1135 | -951.0 | NaN |
| 436084 | Bad debt | -986 | NaN | NaN |
| 436084 | Active | -968 | -845.0 | NaN |
| 436084 | Active | -987 | -803.0 | NaN |
+------------+---------------+-------------+---------------------+-------------------+

我想使用以下规则创建一个新列 CREDIT_LENGTH_IN_DAYS:

def func(x):
if x[x['CREDIT_ACTIVE'] == 'Active']:
return x['DAYS_CREDIT_ENDDATE'] - x['DAYS_CREDIT']
elif x[x['CREDIT_ACTIVE'] == 'Closed'] | x[x['CREDIT_ACTIVE'] == 'Sold'] :
return x['DAYS_ENDDATE_FACT'] - x['DAYS_CREDIT']
elif x[x['CREDIT_ACTIVE'] == 'Bad debt']:
return x['DAYS_CREDIT']

然后我使用:

df_bureau['CREDIT_LENGTH_IN_DAYS'] = df_bureau.apply(func, axis=1)

但是,当情况为 x[x['CREDIT_ACTIVE']=='坏账' 时,我得到有趣的值,而不是 x['DAYS_CREDIT' 中每行的实际值]

最佳答案

使用numpy.select :

m1 = df_bureau['CREDIT_ACTIVE'] == 'Active'
m2 = df_bureau['CREDIT_ACTIVE'].isin(['Closed','Sold'])
m3 = df_bureau['CREDIT_ACTIVE'] == 'Bad debt'

v1 = df_bureau['DAYS_CREDIT_ENDDATE'] - df_bureau['DAYS_CREDIT']
v2 = df_bureau['DAYS_ENDDATE_FACT'] - df_bureau['DAYS_CREDIT']
v3 = df_bureau['DAYS_CREDIT']

df_bureau['CREDIT_LENGTH_IN_DAYS'] = np.select([m1, m2, m3], [v1, v2, v3], np.nan)
print (df_bureau)
SK_ID_CURR CREDIT_ACTIVE DAYS_CREDIT DAYS_CREDIT_ENDDATE \
0 436084 Sold -2835 -2094.0
1 436084 Active -987 -438.0
2 436084 Sold -1875 -1494.0
3 436084 Active -1135 -951.0
4 436084 Bad debt -986 NaN
5 436084 Active -968 -845.0
6 436084 Active -987 -803.0

DAYS_ENDDATE_FACT CREDIT_LENGTH_IN_DAYS
0 -2436.0 399.0
1 NaN 549.0
2 -1494.0 381.0
3 NaN 184.0
4 NaN -986.0
5 NaN 123.0
6 NaN 184.0

您的解决方案分别处理每一行,因此不需要过滤,还需要将 | 更改为 or 因为使用标量:

def func(x):
if x['CREDIT_ACTIVE'] == 'Active':
return x['DAYS_CREDIT_ENDDATE'] - x['DAYS_CREDIT']
elif (x['CREDIT_ACTIVE'] == 'Closed') or (x['CREDIT_ACTIVE'] == 'Sold'):
return x['DAYS_ENDDATE_FACT'] - x['DAYS_CREDIT']
elif x['CREDIT_ACTIVE'] == 'Bad debt':
return x['DAYS_CREDIT']

df_bureau['CREDIT_LENGTH_IN_DAYS'] = df_bureau.apply(func, axis=1)

关于python - 根据条件在 DataFrame 中添加新列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51653878/

32 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com