gpt4 book ai didi

python - 有没有因为它的编码方式而运行缓慢的原因? Python

转载 作者:太空宇宙 更新时间:2023-11-04 02:04:38 24 4
gpt4 key购买 nike

所以我这里有一些代码可以循环访问一个目录中的 10 个文件。每个文件可能有数千行。然后代码逐行过滤掉这些文件中的一些单词。我知道这可能需要一段时间,但我的代码是否可以以某种方式改进以加快此过程。我是否在某个地方犯了导致瓶颈的编码错误?任何帮助或建议将不胜感激!这是我的代码:

import os

def remove_stop_words(string, stopwords_list):
string_to_list = string.split()
x = (' '.join(i for i in string_to_list if i.lower() not in (x.lower() for x in stopwords_list)))
x = x+'\n'
return x

def get_stop_words_list(stopwords_path):
with open(stopwords_path, 'r') as f:
stopwords = f.read().split()
return stopwords

def main():
input_location = 'C:/Users/User/Desktop/mini_mouse'
output_location = 'C:/Users/User/Desktop/test/'
stop_words_path = 'C:/Users/User/Desktop/NLTK-stop-word-list.txt'
stopwords = get_stop_words_list(stop_words_path)
#print(stopwords)

for root, dirs, files in os.walk(input_location):
for name in files:
file_path = os.path.join(root, name) # joins the new path of the file to the current file in order to access the file
with open(file_path, 'r') as f: # open the file
for line in f: # read file line by line
x = remove_stop_words(line,stopwords)
new_file_path = os.path.join(output_location, name) + '_filtered' # creates a new file of the file that is currenlty being filtered of stopwords
with open(new_file_path, 'a') as output_file: # opens output file
output_file.write(x) # writes the newly filtered text to the new output file



if __name__ == "__main__":
main()

最佳答案

这是一个逐个文件而不是逐行写入的解决方案:

for root, dirs, files in os.walk(input_location):
for name in files:
file_path = os.path.join(root, name) # joins the new path of the file to the current file in order to access the file

filestring = ''
with open(file_path, 'r') as f: # open the file
for line in f: # read file line by line
x = remove_stop_words(line,stopwords)
filestring+=x
filestring+='\n' #Create new line

new_file_path = os.path.join(output_location, name) + '_filtered' # creates a new file of the file that is currenlty being filtered of stopwords
with open(new_file_path, 'a') as output_file: # opens output file
output_file.write(filestring) # writes the newly filtered text to the new output file

关于python - 有没有因为它的编码方式而运行缓慢的原因? Python,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54991163/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com