gpt4 book ai didi

python - 计算一次匹配不同模式的行数

转载 作者:行者123 更新时间:2023-12-01 05:46:31 26 4
gpt4 key购买 nike

我有一个 python 脚本,给定的模式会遍历一个文件,并且对于与该模式匹配的每一行,它都会计算该行在文件中出现的次数。

脚本如下:

#!/usr/bin/env python

import time
fnamein = 'Log.txt'

def filter_and_count_matches(fnamein, fnameout, match):
fin = open(fnamein, 'r')
curr_matches = {}
order_in_file = [] # need this because dict has no particular order
for line in (l for l in fin if l.find(match) >= 0):
line = line.strip()
if line in curr_matches:
curr_matches[line] += 1
else:
curr_matches[line] = 1
order_in_file.append(line)
#
fout = open(fnameout, 'w')
#for line in order_in_file:
for line, _dummy in sorted(curr_matches.iteritems(),
key=lambda (k, v): (v, k), reverse=True):
fout.write(line + '\n')
fout.write(' = {}\n'.format(curr_matches[line]))
fout.close()

def main():
for idx, match in enumerate(open('staffs.txt', 'r').readlines()):
curr_time = time.time()
match = match.strip()
fnameout = 'm{}.txt'.format(idx+1)
filter_and_count_matches(fnamein, fnameout, match)
print 'Processed {}. Time = {}'.format(match, time.time() - curr_time)

main()

所以现在每次我想检查不同的模式时我都会检查该文件。可以只检查文件一次(文件很大,因此需要一段时间来处理)。如果能够以一种优雅的“简单”方式做到这一点,那就太好了。谢谢!

谢谢

最佳答案

看起来 Counter 可以满足您的需要:

from collections import Counter
lines = Counter([line for line in myfile if match_string in line])

例如,如果myfile包含

123abc
abc456
789
123abc
abc456

并且match_string“abc”,那么上面的代码给你

>>> lines
Counter({'123abc': 2, 'abc456': 2})

对于多种模式,这样怎么样:

patterns = ["abc", "123"]
# initialize one Counter for each pattern
results = {pattern:Counter() for pattern in patterns}
for line in myfile:
for pattern in patterns:
if pattern in line:
results[pattern][line] += 1

关于python - 计算一次匹配不同模式的行数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15952913/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com