gpt4 book ai didi

python - 精确的单词匹配并显示在列中

转载 作者:行者123 更新时间:2023-12-01 01:07:05 24 4
gpt4 key购买 nike

我有以下数据框(df)

   Comments                       ID
0 10 Looking for help
1 11 Look at him but be nice
2 12 Be calm
3 13 Being good
4 14 Him and Her
5 15 Himself

以及列表中的一些单词,我需要搜索完全匹配

word_list = ['look','be','him']

这是我想要的输出

   Comments                       ID Word_01 Word_02 Word_03
0 10 Looking for help
1 11 Look at him but be nice look be him
2 12 Be calm be
3 13 Being good
4 14 Him and Her him
5 15 Himself

我尝试过一些东西,比如 str.findall

str.findall(r"\b" + '|'.join(word_list) + r"\b",flags = re.I)

还有其他一些,但我似乎无法为我的话找到完全匹配的内容。

任何解决此问题的帮助将不胜感激。

谢谢

最佳答案

您需要每个单词的单词边界。一种可能的解决方案是 Series.str.extractall , DataFrame.add_prefixDataFrame.join原始DataFrame:

word_list = ['look','be','him']

pat = '|'.join(r"\b{}\b".format(x) for x in word_list)
df1 = df['ID'].str.extractall('(' + pat + ')', flags = re.I)[0].unstack().add_prefix('Word_')

对于输出中的小写数据,请添加 Series.str.lower :

df1 = (df['ID'].str.lower()
.str.extractall('(' + pat + ')')[0]
.unstack()
.add_prefix('Word_'))
<小时/>
df = df.join(df1).fillna('')
print (df)
Comments ID Word_0 Word_1 Word_2
0 10 Looking for help
1 11 Look at him but be nice Look him be
2 12 Be calm Be
3 13 Being good
4 14 Him and Her Him
5 15 Himself

您的解决方案应该按相同的模式进行更改,将值转换为 list 并将 join 转换为原始值:

pat = '|'.join(r"\b{}\b".format(x) for x in word_list)
df1 = (pd.DataFrame(df['ID']
.str.findall(pat, flags = re.I).values.tolist())
.add_prefix('Word_')
.fillna(''))

或者使用列表理解(应该是最快的):

df1 = (pd.DataFrame([re.findall(pat, x, flags = re.I) for x in df['ID']])
.add_prefix('Word_')
.fillna(''))

对于小写字母,请添加.lower():

pat = '|'.join(r"\b{}\b".format(x) for x in word_list)
df1 = (pd.DataFrame([re.findall(pat, x.lower(), flags = re.I) for x in df['ID']])
.add_prefix('Word_')
.fillna(''))

关于python - 精确的单词匹配并显示在列中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55241270/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com