gpt4 book ai didi

python - 将 pandas 数据框与关键重复项合并

转载 作者:行者123 更新时间:2023-12-02 09:19:45 25 4
gpt4 key购买 nike

我有 2 个数据框,两个数据框都有一个可能有重复项的键列,但数据框大多具有相同的重复键。我想在该键上合并这些数据帧,但是以这样的方式,当两者具有相同的重复项时,这些重复项将分别合并。此外,如果一个数据帧比另一个数据帧具有更多的键重复项,我希望将其值填充为 NaN。例如:

df1 = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K2', 'K2', 'K3'],
'A': ['A0', 'A1', 'A2', 'A3', 'A4', 'A5']},
columns=['key', 'A'])
df2 = pd.DataFrame({'B': ['B0', 'B1', 'B2', 'B3', 'B4', 'B5', 'B6'],
'key': ['K0', 'K1', 'K2', 'K2', 'K3', 'K3', 'K4']},
columns=['key', 'B'])

key A
0 K0 A0
1 K1 A1
2 K2 A2
3 K2 A3
4 K2 A4
5 K3 A5

key B
0 K0 B0
1 K1 B1
2 K2 B2
3 K2 B3
4 K3 B4
5 K3 B5
6 K4 B6

我正在尝试获得以下输出

   key    A   B
0 K0 A0 B0
1 K1 A1 B1
2 K2 A2 B2
3 K2 A3 B3
6 K2 A4 NaN
8 K3 A5 B4
9 K3 NaN B5
10 K4 NaN B6

所以基本上,我想将重复的 K2 键视为 K2_1、K2_2...,然后在数据帧上执行 how='outer' 合并。我有什么想法可以做到这一点吗?

最佳答案

再次更快

%%cython
# using cython in jupyter notebook
# in another cell run `%load_ext Cython`
from collections import defaultdict
import numpy as np

def cg(x):
cnt = defaultdict(lambda: 0)

for j in x.tolist():
cnt[j] += 1
yield cnt[j]


def fastcount(x):
return [i for i in cg(x)]

df1['cc'] = fastcount(df1.key.values)
df2['cc'] = fastcount(df2.key.values)

df1.merge(df2, how='outer').drop('cc', 1)

更快的回答;不可扩展

def fastcount(x):
unq, inv = np.unique(x, return_inverse=1)
m = np.arange(len(unq))[:, None] == inv
return (m.cumsum(1) * m).sum(0)

df1['cc'] = fastcount(df1.key.values)
df2['cc'] = fastcount(df2.key.values)

df1.merge(df2, how='outer').drop('cc', 1)

旧答案

df1['cc'] = df1.groupby('key').cumcount()
df2['cc'] = df2.groupby('key').cumcount()

df1.merge(df2, how='outer').drop('cc', 1)

enter image description here

关于python - 将 pandas 数据框与关键重复项合并,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40575486/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com