gpt4 book ai didi

python - python统计多个文档中的词频

转载 作者:行者123 更新时间:2023-11-30 09:13:27 25 4
gpt4 key购买 nike

我在字典“d”中有多个文本文件的地址列表:

'd:/individual-articles/9.txt', 'd:/individual-articles/11.txt', 'd:/individual-articles/12.txt',...

等等...

现在,我需要读取字典中的每个文件,并保留整个字典中出现的每个单词的单词出现列表。

我的输出应采用以下形式:

the-500

a-78

in-56

等等..

其中 500 是单词“the”在字典中所有文件中出现的次数..依此类推..

我需要对所有单词执行此操作。

我是一个Python新手..请帮忙!

我的下面的代码不起作用,它没有显示任何输出!我的逻辑一定有错误,请纠正!!

import collections
import itertools
import os
from glob import glob
from collections import Counter




folderpaths='d:/individual-articles'
counter=Counter()


filepaths = glob(os.path.join(folderpaths,'*.txt'))




folderpath='d:/individual-articles/'
# i am creating my dictionary here, can be ignored
d = collections.defaultdict(list)
with open('topics.txt') as f:
for line in f:
value, *keys = line.strip().split('~')
for key in filter(None, keys):
if key=='earn':
d[key].append(folderpath+value+".txt")

for key, value in d.items() :
print(value)


word_count_dict={}

for file in d.values():
with open(file,"r") as f:
words = re.findall(r'\w+', f.read().lower())
counter = counter + Counter(words)
for word in words:
word_count_dict[word].append(counter)


for word, counts in word_count_dict.values():
print(word, counts)

最佳答案

灵感来自于您使用的 Counter 集合:

from glob import glob
from collections import Counter
import re

folderpaths = 'd:/individual-articles'
counter = Counter()

filepaths = glob(os.path.join(folderpaths,'*.txt'))
for file in filepaths:
with open(file) as f:
words = re.findall(r'\w+', f.read().lower())
counter = counter + Counter(words)
print counter

关于python - python统计多个文档中的词频,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17399535/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com