gpt4 book ai didi

python - 如何在一个 pandas 数据框列中搜索字符串作为另一个数据框列中的子字符串

转载 作者:太空宇宙 更新时间:2023-11-04 04:19:01 24 4
gpt4 key购买 nike

我有两个 pandas 数据帧 df1df2。我需要在 df1 中创建一个新列,方法是搜索 df2['B'] 以查看 df1['A'] 是否是子字符串df2['B']。如果匹配,则为 df1['B'] 中的新列返回 df2['A'] 的值。

下面是示例数据框

df1

      A                  B           
9.female.ceo.,ceo, ?
9.female.ned.,ned,
9.female.ned.,chair,
2.female.ed.,ned,
2.female.ned.,ed,
9.female.chair.,ceo,
2.female.chair.,chair,

df2

     A                B
,ceo,ned, 2.male.chair.,ceo,ned,
,chair,ned, 2.male.ned.,chair,ned,
,ned, 2.female.ed.,ned,
,ceo,chair, 6.female.ed.,ceo,chair,
,ed,ceo, 6.male.chair.,ed,ceo,
,ceo,chair, 9.female.ed.,ceo,chair,
,ceo,ned, 9.female.chair.,ceo,ned,
,chair,(in ft10), 9.male.ceo.,chair,(in ft10),

合并在这种情况下不起作用,因为 df1['A'] 包含 df2['B'] 的子字符串

任何指向正确方向的帮助将不胜感激。

预期结果

df1

      A                    B           
9.female.ceo.,ceo,
9.female.ned.,ned,
9.female.ned.,chair,
2.female.ed.,ned, ,ned,
2.female.ned.,ed,
9.female.chair.,ceo, ,ceo,ned,
2.female.chair.,chair,

最佳答案

想法是通过 , 拆分创建集合并通过 issubset 匹配:

d = {k: set(v.split(',')) for k, v in df2.set_index('A')['B'].items()}
df1['B'] = [next(iter([k for k, v in d.items() if set(x.split(',')).issubset(v)]), '')
for x in df1['A']]
print (df1)
A B
0 9.female.ceo.,ceo,
1 9.female.ned.,ned,
2 9.female.ned.,chair,
3 2.female.ed.,ned, ,ned,
4 2.female.ned.,ed,
5 9.female.chair.,ceo, ,ceo,ned,
6 2.female.chair.,chair,

通过 in 测试的解决方案:

d = df2.set_index('A')['B']
df1['B'] = [next(iter([k for k, v in d.items() if x in v]), '') for x in df1['A']]
print (df1)
A B
0 9.female.ceo.,ceo,
1 9.female.ned.,ned,
2 9.female.ned.,chair,
3 2.female.ed.,ned, ,ned,
4 2.female.ned.,ed,
5 9.female.chair.,ceo, ,ceo,ned,
6 2.female.chair.,chair,

通过 merge 与测试子串通过 in 进行交叉连接的另一种解决方案:

df3 = df1.assign(tmp=1).merge(df2.assign(tmp=1), on='tmp', suffixes=('','_'))
df3 = df3.loc[[a in b for a, b in zip(df3['A'], df3['B_'])], ['A','A_']]

df = df1[['A']].merge(df3.rename(columns={'A_':'B'}), on='A', how='left')
print (df)
A B
0 9.female.ceo.,ceo, NaN
1 9.female.ned.,ned, NaN
2 9.female.ned.,chair, NaN
3 2.female.ed.,ned, ,ned,
4 2.female.ned.,ed, NaN
5 9.female.chair.,ceo, ,ceo,ned,
6 2.female.chair.,chair, NaN

关于python - 如何在一个 pandas 数据框列中搜索字符串作为另一个数据框列中的子字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54852552/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com