gpt4 book ai didi

python - 如果一个列字符串包含在另一个 Python 中,则一对多合并两个数据框

转载 作者:行者123 更新时间:2023-12-04 14:03:23 24 4
gpt4 key购买 nike

我有两个数据框,我想根据 words 的 if 列值进行合并来自 df1包含 keywords 的列值来自 df2 .我一直在尝试使用 str.extract .但到目前为止还没有运气获得预期的结果。示例如下:

df1:

[{'id': 1, 'words': 'chellomedia', 'languages': nan},
{'id': 2, 'words': 'Moien Welt!', 'languages': 'Luxemburgish'},
{'id': 3, 'words': 'Ahoj světe!', 'languages': 'Czech'},
{'id': 4, 'words': 'hello world', 'languages': nan},
{'id': 5, 'words': '¡Hola Mundo!', 'languages': 'Spanish'},
{'id': 6, 'words': 'hello kitty', 'languages': 'English'},
{'id': 7, 'words': 'Ciao mondo!', 'languages': 'Italian'},
{'id': 8, 'words': 'hola world', 'languages': nan}]

df2:

[{'code': 1, 'keywords': 'Hello'},
{'code': 2, 'keywords': 'hola'},
{'code': 3, 'keywords': 'world'}]

我的试用代码:

df1['words'] = df1['words'].str.lower()
df2['keywords'] = df2['keywords'].str.lower()

pat = '|'.join([re.escape(x) for x in df2.keywords])
df1.insert(0, 'keywords', df1['words'].str.extract('(' + pat + ')', expand=False))

pd.merge(df1, df2, on='keywords', how='left')

输出:

  keywords  id         words     languages  code
0 hello 1 chellomedia NaN 1.0
1 NaN 2 moien welt! Luxemburgish NaN
2 NaN 3 ahoj světe! Czech NaN
3 hello 4 hello world NaN 1.0
4 hola 5 ¡hola mundo! Spanish 2.0
5 hello 6 hello kitty English 1.0
6 NaN 7 ciao mondo! Italian NaN
7 hola 8 hola world NaN 2.0

但是想要的应该是这样的:

  keywords  id         words     languages  code
0 hello 1 chellomedia NaN 1.0
1 NaN 2 moien welt! Luxemburgish NaN
2 NaN 3 ahoj světe! Czech NaN
3 hello 4 hello world NaN 1.0
4 world 4 hello world NaN 3.0 ---> should be generated in df
5 hola 5 ¡hola mundo! Spanish 2.0
6 hello 6 hello kitty English 1.0
7 NaN 7 ciao mondo! Italian NaN
8 hola 8 hola world NaN 2.0
9 world 8 hola world NaN 3.0 ---> should be generated in df

我怎样才能产生预期的结果?谢谢。

最佳答案

你必须使用 findallexplode 而不是 extract,例如:

df1.insert(0, 'keywords', df1['words'].str.findall('(' + pat + ')'))
print(pd.merge(df1.explode('keywords'), df2, on='keywords', how='left')
.sort_values('id').reset_index(drop=True))

输出:

  keywords  id         words     languages  code
0 hello 1 chellomedia NaN 1.0
1 NaN 2 moien welt! Luxemburgish NaN
2 NaN 3 ahoj světe! Czech NaN
3 hello 4 hello world NaN 1.0
4 world 4 hello world NaN 3.0
5 hola 5 ¡hola mundo! Spanish 2.0
6 hello 6 hello kitty English 1.0
7 NaN 7 ciao mondo! Italian NaN
8 world 8 hola world NaN 3.0
9 hola 8 hola world NaN 2.0

和你需要的完全一样:)

关于python - 如果一个列字符串包含在另一个 Python 中,则一对多合并两个数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69277298/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com