gpt4 book ai didi

python - 在python中通过子字符串匹配两个数据帧

转载 作者:行者123 更新时间:2023-12-04 03:25:38 25 4
gpt4 key购买 nike

我有两个大数据框(1000 行),我需要通过子字符串匹配它们,例如:

df1:

Id    Title
1 The house of pump
2 Where is Andijan
3 The Joker
4 Good bars in Andijan
5 What a beautiful house

df2:

Keyword
house
andijan
joker

预期的输出是:

Id    Title                    Keyword
1 The house of pump house
2 Where is Andijan andijan
3 The Joker joker
4 Good bars in Andijan andijan
5 What a beautiful house house

现在,我已经编写了一种非常低效的方法来匹配它,但对于数据帧的实际大小,它运行了极长的时间:

for keyword in df2.to_dict(orient='records'):
df1['keyword'] = np.where(creative_df['title'].str.contains(keyword['keyword']), keyword['keyword'], df1['keyword'])

现在,我确信有一种对 pandas 更友好、更有效的方法来做同样的事情,并且让它在合理的时间内运行。

最佳答案

让我们试试findall

import re
df1['new'] = df1.Title.str.findall('|'.join(df2.Keyword.tolist()),flags= re.IGNORECASE).str[0]
df1
Id Title new
0 1 The house of pump house
1 2 Where is Andijan Andijan
2 3 The Joker Joker
3 4 Good bars in Andijan Andijan
4 5 What a beautiful house house

关于python - 在python中通过子字符串匹配两个数据帧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67661239/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com