gpt4 book ai didi

python - 使用 Python 的 2 个文件之间最常见的单词

转载 作者:太空宇宙 更新时间:2023-11-03 11:02:00 29 4
gpt4 key购买 nike

我是 Python 的新手,正在尝试编写脚本来查找 2 个文件之间最常见的常用词。我能够分别找到 2 个文件之间最常见的单词,但不确定如何计算让我们说出这两个文件中最常见的前 5 个单词?需要找到常用词,并且两个文件之间这些常用词的频率也应该最高。

import re
from collections import Counter


finalLineLower=''
with open("test3.txt", "r") as hfFile:
for line in hfFile:
finalLine = re.sub('[,.<;:)-=!>_(?"]', '', line)
finalLineLower += finalLine.lower()
words1 = finalLineLower.split()

f = open('test2.txt', 'r')
sWords = [line.strip() for line in f]


finalLineLower1=''
with open("test4.txt", "r") as tsFile:
for line in tsFile:
finalLine = re.sub('[,.<;:)-=!>_(?"]', '', line)
finalLineLower1 += finalLine.lower()
words = finalLineLower1.split()
#print (words)
mc = Counter(words).most_common()
mc2 = Counter(words1).most_common()

print(len(mc))
print(len(mc2))

示例 test3 和 test4 文件如下。测试3:

Essays are generally scholarly pieces of writing giving the author's own argument, but the definition is vague, overlapping with those of an article, a pamphlet and a short story.

测试4:

Essays are generally scholarly pieces of writing giving the author's own argument, but the definition is vague, overlapping with those of an article, a pamphlet and a short story.

Essays can consist of a number of elements, including: literary criticism, political manifestos, learned arguments, observations of daily life, recollections, and reflections of the author. Almost all modern essays are written in prose, but works in verse have been dubbed essays (e.g. Alexander Pope's An Essay on Criticism and An Essay on Man). While brevity usually defines an essay, voluminous works like John Locke's An Essay Concerning Human Understanding and Thomas Malthus's An Essay on the Principle of Population are counterexamples. In some countries (e.g., the United States and Canada), essays have become a major part of formal education. Secondary students are taught structured essay formats to improve their writing skills, and admission essays are often used by universities in selecting applicants and, in the humanities and social sciences, as a way of assessing the performance of students during final exams.

最佳答案

您可以简单地找到您的 Counter 对象与 & 操作数之间的交集:

mc = Counter(words)
mc2 = Counter(words1)
total=mc&mc2
mos=total.most_common(N)

示例:

>>> d1={'a':5,'f':2,'c':1,'h':2,'t':4}
>>> d2={'a':3,'b':2,'e':1,'h':5,'t':6}
>>> c1=Counter(d1)
>>> c2=Counter(d2)
>>> t=c1&c2
>>> t
Counter({'t': 4, 'a': 3, 'h': 2})
>>> t.most_common(2)
[('t', 4), ('a', 3)]

但请注意,& 返回您的计数器之间的最小计数,您还可以使用 union | 返回最大计数,您可以使用简单的字典理解来获取最大计数:

>>> m=c1|c2
>>> m
Counter({'t': 6, 'a': 5, 'h': 5, 'b': 2, 'f': 2, 'c': 1, 'e': 1})
>>> max={i:j for i,j in m.items() if i in t}
>>> max
{'a': 5, 'h': 5, 't': 6}

最后,如果你想要常用词的总和,你可以将你的计数器加在一起:

>>> s=Counter(max)+t
>>> s
Counter({'t': 10, 'a': 8, 'h': 7})

关于python - 使用 Python 的 2 个文件之间最常见的单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30661138/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com