gpt4 book ai didi

python - Pandas - 检查列标签是否存在于另一列的值中并更新该列

转载 作者:行者123 更新时间:2023-12-01 02:20:59 26 4
gpt4 key购买 nike

我有一长串词汇表,想检查段落中是否包含该词汇表,并标记 1 为是,0 为否,简化如下:

>>> glossary = ['phrase 1', 'phrase 2', 'phrase 3']
>>> glossary
['phrase 1', 'phrase 2', 'phrase 3']

>>> df= pd.DataFrame(['This is a phrase 1 and phrase 2', 'phrase 1',
'phrase 3', 'phrase 1 & phrase 2. phrase 3 as well'],columns=['text'])
>>> df
text
0 This is a phrase 1 and phrase 2
1 phrase 1
2 phrase 3
3 phrase 1 & phrase 2. phrase 3 as well

按如下方式连接:

                                    text  phrase 1  phrase 2  phrase 3
0 This is a phrase 1 and phrase 2 NaN NaN NaN
1 phrase 1 NaN NaN NaN
2 phrase 3 NaN NaN NaN
3 phrase 1 & phrase 2. phrase 3 as well NaN NaN NaN

我希望实现每个词汇表列与文本列进行比较,如果词汇表在文本中则更新 1,如果不在文本中则更新 0,在本例中为

                                    text  phrase 1  phrase 2  phrase 3
0 This is a phrase 1 and phrase 2 1 1 0
1 phrase 1 1 0 0
2 phrase 3 0 0 1
3 phrase 1 & phrase 2. phrase 3 as well 1 1 1

你能告诉我如何实现它吗?鉴于在我的数据框中,词汇表列大约有 3000 列,因此我还想概括逻辑,使其基于列标签作为比较每行中相应文本的键。

最佳答案

您可以使用 str.contains 的列表理解和 concat对于 0,1 DataFrame 强制转换为 int:

L = [df['text'].str.contains(x) for x in glossary]
df1 = pd.concat(L, axis=1, keys=glossary).astype(int)
print (df1)
phrase 1 phrase 2 phrase 3
0 1 1 0
1 1 0 0
2 0 0 1
3 1 1 1

然后join原文:

df = df.join(df1)
print (df)
text phrase 1 phrase 2 phrase 3
0 This is a phrase 1 and phrase 2 1 1 0
1 phrase 1 1 0 0
2 phrase 3 0 0 1
3 phrase 1 & phrase 2. phrase 3 as well 1 1 1

关于python - Pandas - 检查列标签是否存在于另一列的值中并更新该列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47952632/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com