gpt4 book ai didi

python - 内存错误使用openpyxl和大数据excel

转载 作者:太空狗 更新时间:2023-10-29 21:31:46 25 4
gpt4 key购买 nike

我编写了一个脚本,它必须从一个文件夹(大约 10,000 个)中读取大量 excel 文件。此脚本加载 excel 文件(其中一些文件超过 2,000 行)并读取一列以计算行数(检查内容)。如果行数不等于给定的数字,它会在日志中写入警告。

当脚本读取超过 1,000 个 excel 文件时,问题就来了。然后它就会抛出内存错误,我不知道问题出在哪里。以前,脚本读取两个 14,000 行的 csv 文件并将其存储在列表中。这些列表包含 excel 文件的标识符及其相应的行数。如果此行数不等于 excel 文件的行数,则会写入警告。阅读这些列表可能有问题吗?

我正在使用 openpyxl 加载工作簿,我需要在打开下一个之前关闭它们吗?

这是我的代码:

# -*- coding: utf-8 -*-

import os
from openpyxl import Workbook
import glob
import time
import csv
from time import gmtime,strftime
from openpyxl import load_workbook

folder = ''
conditions = 0
a = 0
flight_error = 0
condition_error = 0
typical_flight_error = 0
SP_error = 0


cond_numbers = []
with open('Conditions.csv','rb') as csv_name: # Abre el fichero csv donde estarán las equivalencias
csv_read = csv.reader(csv_name,delimiter='\t')

for reads in csv_read:
cond_numbers.append(reads)

flight_TF = []
with open('vuelo-TF.csv','rb') as vuelo_TF:
csv_read = csv.reader(vuelo_TF,delimiter=';')

for reads in csv_read:
flight_TF.append(reads)


excel_files = glob.glob('*.xlsx')

for excel in excel_files:
print "Leyendo excel: "+excel

wb = load_workbook(excel)
ws = wb.get_sheet_by_name('Control System')
flight = ws.cell('A7').value
typical_flight = ws.cell('B7').value
a = 0

for row in range(6,ws.get_highest_row()):
conditions = conditions + 1


value_flight = int(ws.cell(row=row,column=0).value)
value_TF = ws.cell(row=row,column=1).value
value_SP = int(ws.cell(row=row,column=4).value)

if value_flight == '':
break

if value_flight != flight:
flight_error = 1 # Si no todos los flight numbers dentro del vuelo son iguales

if value_TF != typical_flight:
typical_flight_error = 2 # Si no todos los typical flight dentro del vuelo son iguales

if value_SP != 100:
SP_error = 1



for cond in cond_numbers:
if int(flight) == int(cond[0]):
conds = int(cond[1])
if conds != int(conditions):
condition_error = 1 # Si el número de condiciones no se corresponde con el esperado

for vuelo_TF in flight_TF:
if int(vuelo_TF[0]) == int(flight):
TF = vuelo_TF[1]
if typical_flight != TF:
typical_flight_error = 1 # Si el vuelo no coincide con el respectivo typical flight

if flight_error == 1:
today = datetime.datetime.today()
time = today.strftime(" %Y-%m-%d %H.%M.%S")
log = open('log.txt','aw')
message = time+': Los flight numbers del vuelo '+str(flight)+' no coinciden.\n'
log.write(message)
log.close()
flight_error = 0

if condition_error == 1:
today = datetime.datetime.today()
time = today.strftime(" %Y-%m-%d %H.%M.%S")
log = open('log.txt','aw')
message = time+': El número de condiciones del vuelo '+str(flight)+' no coincide. Condiciones esperadas: '+str(int(conds))+'. Condiciones obtenidas: '+str(int(conditions))+'.\n'
log.write(message)
log.close()
condition_error = 0

if typical_flight_error == 1:
today = datetime.datetime.today()
time = today.strftime(" %Y-%m-%d %H.%M.%S")
log = open('log.txt','aw')
message = time+': El vuelo '+str(flight)+' no coincide con el typical flight. Typical flight respectivo: '+TF+'. Typical flight obtenido: '+typical_flight+'.\n'
log.write(message)
log.close()
typical_flight_error = 0

if typical_flight_error == 2:
today = datetime.datetime.today()
time = today.strftime(" %Y-%m-%d %H.%M.%S")
log = open('log.txt','aw')
message = time+': Los typical flight del vuelo '+str(flight)+' no son todos iguales.\n'
log.write(message)
log.close()
typical_flight_error = 0

if SP_error == 1:
today = datetime.datetime.today()
time = today.strftime(" %Y-%m-%d %H.%M.%S")
log = open('log.txt','aw')
message = time+': Hay algún Step Percentage del vuelo '+str(flight)+' menor que 100.\n'
log.write(message)
log.close()
SP_error = 0

conditions = 0

最后的if语句用于检查和写入警告日志。

我使用的是带有 8 gb RAM 和 intel xeon w3505(双核,2.53 GHz)的 Windows xp。

最佳答案

openpyxl 的默认实现会将所有访问的单元格存储到内存中。我会建议您改用优化阅读器(链接 - https://openpyxl.readthedocs.org/en/latest/optimized.html)

在代码中:-

wb = load_workbook(file_path, use_iterators = True)

加载工作簿时传递 use_iterators = True。然后像这样访问工作表和单元格:

for row in sheet.iter_rows():
for cell in row:
cell_text = cell.value

这会将内存占用减少到 5-10%

更新:在版本 2.4.0 中,use_iterators = True 选项被移除。在较新的版本中,引入了用于转储大量数据的 openpyxl.writer.write_only.WriteOnlyWorksheet

from openpyxl import Workbook
wb = Workbook(write_only=True)
ws = wb.create_sheet()

# now we'll fill it with 100 rows x 200 columns
for irow in range(100):
ws.append(['%d' % i for i in range(200)])

# save the file
wb.save('new_big_file.xlsx')

没有测试刚刚从上面的链接复制的下面的代码。

感谢@SdaliM 提供的信息。

关于python - 内存错误使用openpyxl和大数据excel,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21875249/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com