gpt4 book ai didi

python - Pandas:检查B列中包含的A列中的值

转载 作者:行者123 更新时间:2023-12-01 01:54:11 28 4
gpt4 key购买 nike

我在 df1 中有 100 个关键字,在 df2 中有 10,000 篇文章。我想计算有多少篇文章包含某个关键字。例如,大约有20篇文章包含关键字“苹果”。

我尝试使用 df.str.contains(),但我必须计算每个关键字。你能告诉我一个有效的方法吗?

df1=pd.DataFrame(['apple','mac','pc','ios','lg'],columns=['keywords'])


df2=pd.DataFrame(['apple is good for health','mac is another pc','today is sunday','Star wars pc game','ios is a system,lg is not','lg is a japan company '],columns=['article'])

结果:

1 artricl contain "apple"
1 article contain 'mac'
2 article contain 'pc'
1 article contain "ios"
2 article contain 'lg'

最佳答案

我认为需要str.contains对于 bool 系列,用 sum 表示计数 True,这些过程类似于 1,对于所有 关键字 使用 列表理解DataFrame构造函数:

L = [(x, df2['article'].str.contains(x).sum()) for x in df1['keywords']]
#alternative solution
#L = [(x, sum(x in article for article in df2['article'])) for x in df1['keywords']]
df3 = pd.DataFrame(L, columns=['keyword', 'count'])
print (df3)
keyword count
0 apple 1
1 mac 1
2 pc 2
3 ios 1
4 lg 2

如果只想打印输出:

for x in df1['keywords']:
count = df2['article'].str.contains(x).sum()
#another solution if no NaNs with sum, generator and check membership by in
#count = sum(x in article for article in df2['article'])
print ('{} article contain "{}"'.format(count, x))

1 article contain "apple"
1 article contain "mac"
2 article contain "pc"
1 article contain "ios"
2 article contain "lg"

关于python - Pandas:检查B列中包含的A列中的值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50404722/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com