gpt4 book ai didi

python - 分组并将一个条目中的字符串应用于整个组

转载 作者:行者123 更新时间:2023-12-01 01:44:48 24 4
gpt4 key购买 nike

我需要根据组中的非空值将字符串应用于组。一个例子是:

ID    name    surname  prsn_id
A john smith prsn_01
A john smith NaN
A john smith NaN
A john smith NaN
B mary jane prsn_02
B mary jane NaN
B mary jane NaN
B mary jane NaN
B mary jane NaN
B mary jane NaN
B mary jane NaN
C Barry willis prsn_03
C Barry willis Nan
C Barry willis Nan
C Barry willis Nan
C Barry willis Nan

输出应该是:

ID    name    surname  prsn_id
A john smith prsn_01
A john smith prsn_01
A john smith prsn_01
A john smith prsn_01
B mary jane prsn_02
B mary jane prsn_02
B mary jane prsn_02
B mary jane prsn_02
B mary jane prsn_02
B mary jane prsn_02
B mary jane prsn_02
C Barry willis prsn_03
C Barry willis prsn_03
C Barry willis prsn_03
C Barry willis prsn_03
C Barry willis prsn_03

或者:

ID    name    surname  prsn_id    prsn_id_2
A john smith prsn_01 NaN
A john smith NaN prsn_01
A john smith NaN prsn_01
A john smith NaN prsn_01
B mary jane prsn_02 NaN
B mary jane NaN prsn_02
B mary jane NaN prsn_02
B mary jane NaN prsn_02
B mary jane NaN prsn_02
B mary jane NaN prsn_02
B mary jane NaN prsn_02
C Barry willis prsn_03 NaN
C Barry willis Nan prsn_03
C Barry willis Nan prsn_03
C Barry willis Nan prsn_03
C Barry willis Nan prsn_03

我已经尝试过:

df['prsn_id_2'] = (df
.groupby(['ID', 'name', 'surname'])['prsn_id']
.fillna(method='ffill'))

这可能有效,但需要时间,因此 future 不太实用。我需要另一个矢量化且相对快速的解决方案。

最佳答案

使用dropna用于删除 NaN 行,然后使用 merge 左连接:

df1 = df.dropna(subset=['prsn_id'])
#if possible duplicates
#df1 = df.dropna(subset=['prsn_id']).drop_duplicates(['ID','name', 'surname'])
df = df.drop('prsn_id', axis=1).merge(df1, on=['ID','name', 'surname'], how='left')
print (df)
ID name surname prsn_id
0 A john smith prsn_01
1 A john smith prsn_01
2 A john smith prsn_01
3 A john smith prsn_01
4 B mary jane prsn_02
5 B mary jane prsn_02
6 B mary jane prsn_02
7 B mary jane prsn_02
8 B mary jane prsn_02
9 B mary jane prsn_02
10 B mary jane prsn_02
11 C Barry willis prsn_03
12 C Barry willis prsn_03
13 C Barry willis prsn_03
14 C Barry willis prsn_03
15 C Barry willis prsn_03

详细信息:

print (df1)
ID name surname prsn_id
0 A john smith prsn_01
4 B mary jane prsn_02
11 C Barry willis prsn_03

关于python - 分组并将一个条目中的字符串应用于整个组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51496429/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com