gpt4 book ai didi

python - 删除重复项并添加值 Pandas

转载 作者:太空宇宙 更新时间:2023-11-03 13:32:37 25 4
gpt4 key购买 nike

我在下面有一个数据框。我想删除重复项,但将 E 列中的重复值添加到非重复记录

import pandas as pd
import numpy as np
dfp = pd.DataFrame({'A' : [np.NaN,np.NaN,3,4,5,5,3,1,6,7],
'B' : [1,1,3,5,0,0,np.NaN,9,0,0],
'C' : ['AA1233445','AA1233445', 'rmacy','Idaho Rx','Ab123455','TV192837','RX','Ohio Drugs','RX12345','USA Pharma'],
'D' : [123456,123456,1234567,12345678,12345,12345,12345678,123456789,1234567,np.NaN],
'E' : ['Assign','Allign','Hello','Ugly','Appreciate','Undo','Testing','Unicycle','Pharma','Unicorn',]})
print(dfp)

我正在抓取所有重复项:

df2 = dfp.loc[(dfp['A'].duplicated(keep=False))].copy()

A B C D E
0 NaN 1.0 AA1233445 123456.0 Assign
1 NaN 1.0 AA1233445 123456.0 Allign
2 3.0 3.0 rmacy 1234567.0 Hello
4 5.0 0.0 Ab123455 12345.0 Appreciate
5 5.0 0.0 TV192837 12345.0 Undo
6 3.0 NaN RX 12345678.0 Testing

并希望我的结果是:

     A    B          C           D           E
0 NaN 1.0 AA1233445 123456.0 Assign Allign
2 3.0 3.0 rmacy 1234567.0 Hello Testing
4 5.0 0.0 Ab123455 12345.0 Appreciate Undo

我知道我需要使用 dfp.loc[(dfp['A'].duplicated(keep='last'))].copy() 来获取第一次出现的位置,但我我未能将 E 列的值设置为包含其他重复值。

我在想我需要尝试这样的事情:

df3 = dfp.loc[(dfp['A'].duplicated(keep='last'))].copy()
df3['E'] = df3['E'] + dfp.loc[(dfp['A'].duplicated(keep=False).copy()),'E']

但我的输出是:

     A    B          C          D                     E
0 NaN 1.0 AA1233445 123456.0 AssignAssign
2 3.0 3.0 rmacy 1234567.0 HelloHello
4 5.0 0.0 Ab123455 12345.0 AppreciateAppreciate

我被难住了。我把它复杂化了吗?我怎样才能得到我正在寻找的输出,以便我以后可以删除所有重复项,除了第一个,但将删除的值“保存”在 E 列中?

最佳答案

定义要在 agg 中使用并在 groupby 中使用的函数。为了让 groupby 与 NaN 一起工作,我先转换为字符串,然后再转换回 float 。

f = {c: ' '.join if c == 'E' else 'first' for c in ['B', 'C', 'D', 'E']}

dfp.groupby(
dfp.A.astype(str), sort=False
).agg(f).reset_index().eval(
'A = @pd.to_numeric(A, "coerce").values',
inplace=False
)

A B C D E
0 NaN 1.0 AA1233445 123456.0 Assign Allign
1 3.0 3.0 rmacy 1234567.0 Hello Testing
2 4.0 5.0 Idaho Rx 12345678.0 Ugly
3 5.0 0.0 Ab123455 12345.0 Appreciate Undo
4 1.0 9.0 Ohio Drugs 123456789.0 Unicycle
5 6.0 0.0 RX12345 1234567.0 Pharma
6 7.0 0.0 USA Pharma NaN Unicorn

将其限制为仅重复的行:

f = {c: ' '.join if c == 'E' else 'first' for c in ['B', 'C', 'D', 'E']}
d1 = dfp[dfp.duplicated('A', keep=False)]
d2 = d1.groupby(d1.A.astype(str), sort=False).agg(f).reset_index()
d2.A = d2.A.astype(float)

d2

     A    B          C          D                E
0 NaN 1.0 AA1233445 123456.0 Assign Allign
1 3.0 3.0 rmacy 1234567.0 Hello Testing
2 5.0 0.0 Ab123455 12345.0 Appreciate Undo

关于python - 删除重复项并添加值 Pandas,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44397210/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com