gpt4 book ai didi

python - 在 python 数据框中查找正则表达式

转载 作者:行者123 更新时间:2023-12-01 06:49:26 26 4
gpt4 key购买 nike

我遇到一个问题

我有一个名为雇主的数据框,如下所示:

employer
------------
wings brand activation i pvt ltd
hofincons infotech &industrial services pvt .ltd
bharat fritz werner bangalore
kludi rak indpvt ltd.

另一个数据框,它将雇主名称映射到看起来像这样的类别(称之为 pincode):

Index   Name                                    FINAL_CATEGORY
68781 central board of excise and customs cat b
68782 c a g hotels pvt ltd cat b
68783 avaneetha textiles pvt ltd cat a
68784 trendy wheels pvt ltd cat a+
68785 wings brand activations india pvt ltd cat b

现在我想模仿类似的东西:

pincode[pincode['Compnay Name'].str.contains('wings brand activation i pvt ltd')]

Compnay Name FINAL_CATEGORY
____________________________________

pincode[pincode['Compnay Name'].str.contains('wings brand activation i pvt')]

Compnay Name FINAL_CATEGORY
____________________________________

pincode[pincode['Compnay Name'].str.contains('wings brand activation i')]


Compnay Name FINAL_CATEGORY
____________________________________

pincode[pincode['Compnay Name'].str.contains('wings brand activation')]

Name FINAL_CATEGORY
68785 wings brand activations india pvt ltd cat b

如您所见,对于每个字符串,我都会从字符串末尾开始减少长度,直到最后一个空格,然后进行搜索。

上面的内容需要放入循环中(我认为是正则表达式)。因此,对于雇主表中的每个条目,它都会搜索整个 pincode 范围并找出最接近的匹配项。如果没有返回 nan。

提前致谢,由于这个问题有点难以用语言表达,请要求任何澄清。

最佳答案

您可以使用迭代方法,如下所示:

def find_substr(employer, pincode):
employer = employer.set_index("employer")
for words in employer.index.map(str.split):
length = len(words)
found = False
while length > 0 and not found:
substr = ' '.join(words[:length]).replace('(', '\(')
mask = pincode.Name.str.contains(substr)
if mask.any():
employer.loc[' '.join(words), 'cat'] = pincode.loc[mask, 'FINAL_CATEGORY'].values[0]
found = True
length -= 1
employer = employer.reset_index()
return employer

employer = find_substr(employer, pincode)
print(employer)
                                           employer    cat
0 wings brand activation i pvt ltd cat b
1 hofincons infotech &industrial services pvt .ltd NaN
2 bharat fritz werner bangalore NaN
3 kludi rak indpvt ltd NaN

关于python - 在 python 数据框中查找正则表达式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59087039/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com