gpt4 book ai didi

python - 在精确文本匹配时重新索引数据框

转载 作者:行者123 更新时间:2023-12-04 08:22:36 24 4
gpt4 key购买 nike

text 时,我想创建一个数据帧索引与另一个数据帧索引的映射。列(给定)匹配。两个数据帧的长度相等,并且总是会完全匹配。

df_original = pd.DataFrame(dict(text=['The cat sat on the table', 'There is a kind of hush', 'The boy kicked the ball', 'He shot the elephant', 'I want to eat right now!']))
df = pd.DataFrame(dict(text=['He shot the elephant', 'The boy kicked the ball', 'The cat sat on the table', 'I want to eat right now!', 'There is a kind of hush']))
df_original好像:
0   The cat sat on the table
1 There is a kind of hush
2 The boy kicked the ball
3 He shot the elephant
4 I want to eat right now!
df好像:
0   He shot the elephant
1 The boy kicked the ball
2 The cat sat on the table
3 I want to eat right now!
4 There is a kind of hush
我想得到字典映射,就像这样,
d = {2: 0, 4: 1, 1: 2, 0: 3, 3: 4}
例如: df 的第二个索引与 df_original 的第 0 个索引匹配.所以它们必须映射在一起等等。
如果可能,我更喜欢矢量化操作,并且正在寻找一种。
我试着做:
d = {}
for i1, r1 in df_original.iterrows():
for i2, r2 in df.iterrows():
if r1[0] == r2[0]:
d[i2] = i1
print(d)
# {2: 0, 4: 1, 1: 2, 0: 3, 3: 4}
但这非常慢,因为我有数百万行的数据帧。

最佳答案

试试 merge :

(df_original.reset_index()
.merge(df.reset_index(), on='text')
.set_index('index_y')['index_x'].to_dict()
)
出去:
{2: 0, 4: 1, 1: 2, 0: 3, 3: 4}

关于python - 在精确文本匹配时重新索引数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65426849/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com