gpt4 book ai didi

python - Pandas Group by 并查找常见字符串的计数

转载 作者:行者123 更新时间:2023-11-28 22:17:33 25 4
gpt4 key购买 nike

我的数据框:

pd.DataFrame({'company':['Chipotle','Branchburg Chipotle','Chipotle NJ','Chipotle 8853','The Home Depot','Home Depot','28211 Home Depot','Wendys','BJs','Buffalo wings'],
'address':['123 Main Street Branchburg NJ 08853'
,'123 Main Street Branchburg NJ 08853'
,'123 Main Street Branchburg NJ 08853'
,'123 Main Street Branchburg NJ 08853'
,'1220 N Wendover Rd Charlotte NC 28211'
,'1220 N Wendover Rd Charlotte NC 28211'
,'1220 N Wendover Rd Charlotte NC 28211'
,'2805 Whitson St Selma CA 93662'
,'2805 Whitson St Selma CA 93662'
,'2805 Whitson St Selma CA 93662']})

company address
0 Chipotle 123 Main Street Branchburg NJ 08853
1 Branchburg Chipotle 123 Main Street Branchburg NJ 08853
2 Chipotle NJ 123 Main Street Branchburg NJ 08853
3 Chipotle 8853 123 Main Street Branchburg NJ 08853
4 The Home Depot 1220 N Wendover Rd Charlotte NC 28211
5 Home Depot 1220 N Wendover Rd Charlotte NC 28211
6 28211 Home Depot 1220 N Wendover Rd Charlotte NC 28211
7 Wendy's 2805 Whitson St Selma CA 93662
8 BJ's 2805 Whitson St Selma CA 93662
9 Buffalo wings 2805 Whitson St Selma CA 93662

我必须按地址分组并在公司列中找到常用词并将其写入新列“计数”。所以对于第一个地址,常用词是 chipotle,所以计数是 1,对于第二个地址,常用词是 home depot,所以计数为 2,对于第三个地址,没有常用词,所以计数为 0

预期输出

     company        address                               count
0 Chipotle 123 Main Street Branchburg NJ 08853 1
1 The Home Depot 1220 N Wendover Rd Charlotte NC 28211 2
2 Wendy's 2805 Whitson St Selma CA 93662 0

我可以考虑遍历数据框并使用集合交集,但这个过程太慢了。有什么 Pandas 方法可以实现这一目标吗?

最佳答案

from functools import reduce
import operator
def log(x):
inters = reduce(operator.and_, [set(r) for r in x.str.split()])
if inters: return (' '.join(inters), len(inters))
return (x.iloc[0], 0)
df.groupby('address').agg(log).company.apply(pd.Series).rename({0: 'company', 1: 'count'}, axis=1)

company count
address
1220 N Wendover Rd Charlotte NC 28211 Home Depot 2
123 Main Street Branchburg NJ 08853 Chipotle 1
2805 Whitson St Selma CA 93662 Wendys 0

如果 Pandas 0​​.20

.rename(columns={0: 'company', 1: 'count'})

关于python - Pandas Group by 并查找常见字符串的计数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51310599/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com