gpt4 book ai didi

python - 当一行包含另一行的字符串时如何匹配行?

转载 作者:太空宇宙 更新时间:2023-11-04 11:12:05 25 4
gpt4 key购买 nike

我的目标是找到与 general_text 列中的行相匹配的 City,但匹配必须是精确的。

我试图使用搜索 IN 但它没有给我预期的结果,所以我尝试使用 str.contain 但我尝试的方式这样做会显示错误。关于如何正确或有效地执行此操作的任何提示?

我试过基于 Filtering out rows that have a string field contained in one of the rows of another column of strings 的代码

df['matched'] = df.apply(lambda x: x.City in x.general_text, axis=1)

但它给了我以下结果:

data = [['palm springs john smith':'spring'],
['palm springs john smith':'palm springs'],
['palm springs john smith':'smith'],
['hamptons amagansett':'amagansett'],
['hamptons amagansett':'hampton'],
['hamptons amagansett':'gans'],
['edward riverwoods lake':'wood'],
['edward riverwoods lake':'riverwoods']]

df = pd.DataFrame(data, columns = [ 'general_text':'City'])

df['match'] = df.apply(lambda x: x['general_text'].str.contain(
x.['City']), axis = 1)

我想通过上面的代码接收的是只匹配这个:

data = [['palm springs john smith':'palm springs'],
['hamptons amagansett':'amagansett'],
['edward riverwoods lake':'riverwoods']]

最佳答案

您可以使用单词边界 \b\b 进行精确匹配:

import re

f = lambda x: bool(re.search(r'\b{}\b'.format(x['City']), x['general_text']))

或者:

f = lambda x: bool(re.findall(r'\b{}\b'.format(x['City']), x['general_text']))

df['match'] = df.apply(f, axis = 1)
print (df)
general_text City match
0 palm springs john smith spring False
1 palm springs john smith palm springs True
2 palm springs john smith smith True
3 hamptons amagansett amagansett True
4 hamptons amagansett hampton False
5 hamptons amagansett gans False
6 edward riverwoods lake wood False
7 edward riverwoods lake riverwoods True

关于python - 当一行包含另一行的字符串时如何匹配行?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57950732/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com