gpt4 book ai didi

python - 查找一个文件文本中同一行中的项目组合及其频率

转载 作者:行者123 更新时间:2023-12-01 04:03:59 25 4
gpt4 key购买 nike

我有一个文件文本,两个术语列表。

file = "the workers have human rights, the women have rights, the people have to work."

list1 = ['workers, rights']
list2 = ['have', 'the']

需要查找list1中的一项和list2中的一项是否在文件中的同一行,并在文件文本级别计算它们的频率。我尝试了以下代码,但它没有给出正确的频率。

freq = 0
result = []
for line in file.splitlines():
for i in list1:
for x in list2:
if i in line and x in line:
freq +=1
result.append((i,x, freq))

最佳答案

这样做:

import itertools

frequencies = {}
for line in open_file: # You don't need .splitlines() to iterate, and you shouldn't use file as a name
line = line.strip().split()
list1_used = (x for x in list1 if x in line)
list2_used = (x for x in list2 if x in line)
for combination in itertools.product(list1_used, list2_used):
frequencies[combination] = frequencies.get(combination, 0) + 1

这将为每对创建一个频率字典。例如,您可能会得到类似 {('rights', 'have'): 1, ('workers', 'have'): 1, ('rights', 'the'): 1, (' workers', 'the'): 1} 如果您给出的行是文件对象中的唯一行。如果您想考虑给定单词出现的次数,则 list1_usedlist2_used 的情况会稍微复杂一些:

list1_used = sum((((x,) * line.count(x)) for x in list1), ())
list2_used = sum((((y,) * line.count(y)) for y in list2), ())

在这里使用 defaultdict 可能会更容易:

from collections import defaultdict
import itertools

frequencies = defaultdict(int)
for line in open_file:
line = line.strip().split()
list1_used = ...
list2_used = ...
for combination in itertools.product(list1_used, list2_used):
frequencies[combination] += 1

关于python - 查找一个文件文本中同一行中的项目组合及其频率,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35987472/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com