gpt4 book ai didi

python - 在 csv 文件中查找模式

转载 作者:太空宇宙 更新时间:2023-11-04 03:04:22 26 4
gpt4 key购买 nike

我有一个 CSV Excel 文件示例:

Receipt Name    Address      Date       Time    Items
25007 A ABC pte ltd 4/7/2016 10:40 Cheese, Cookie, Pie
.
.
25008 B CCC pte ltd 4/7/2016 12:40 Cheese, Cookie

比较“商品”列并找出人们一起购买的商品的最常见模式并显示 HitTest 门组合的简单方法是什么?在这种情况下,类似的模式是 Cheese, Cookie。

最佳答案

假设在处理 CSV 文件后,您发现 CSV 文件中的项目列表为:

>>> items=['Cheese,Cookie,Pie', 'Cheese,Cookie,Pie', 'Cake,Cookie,Cheese', 
... 'Cheese,Mousetrap,Pie', 'Cheese,Jam','Cheese','Cookie,Cheese,Mousetrap']

首先确定所有可能的对:

>>> from itertools import combinations
>>> all_pairs={frozenset(t) for e in items for t in combinations(e.split(','),2)}

然后你可以这样做:

from collections import Counter
pair_counts=Counter()
for s in items:
for pair in {frozenset(t) for t in combinations(s.split(','), 2)}:
pair_counts.update({tuple(pair):1})

>>> pair_counts
Counter({('Cheese', 'Cookie'): 4, ('Cheese', 'Pie'): 3, ('Cookie', 'Pie'): 2, ('Cheese', 'Mousetrap'): 2, ('Cookie', 'Mousetrap'): 1, ('Cheese', 'Jam'): 1, ('Mousetrap', 'Pie'): 1, ('Cake', 'Cheese'): 1, ('Cake', 'Cookie'): 1})

可以扩展到更一般的情况:

max_n=max(len(e.split(',')) for e in items)
for n in range(max_n, 1, -1):
all_groups={frozenset(t) for e in items for t in combinations(e.split(','),n)}
group_counts=Counter()
for s in items:
for group in {frozenset(t) for t in combinations(s.split(','), n)}:
group_counts.update({tuple(group):1})
print 'group length: {}, most_common: {}'.format(n, group_counts.most_common())

打印:

group length: 3, most_common: [(('Cheese', 'Cookie', 'Pie'), 2), (('Cheese', 'Mousetrap', 'Pie'), 1), (('Cheese', 'Cookie', 'Mousetrap'), 1), (('Cake', 'Cheese', 'Cookie'), 1)]
group length: 2, most_common: [(('Cheese', 'Cookie'), 4), (('Cheese', 'Pie'), 3), (('Cookie', 'Pie'), 2), (('Cheese', 'Mousetrap'), 2), (('Cookie', 'Mousetrap'), 1), (('Cheese', 'Jam'), 1), (('Mousetrap', 'Pie'), 1), (('Cake', 'Cheese'), 1), (('Cake', 'Cookie'), 1)]

关于python - 在 csv 文件中查找模式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39946798/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com