gpt4 book ai didi

python - 使用 Python 将 2Gb+ SELECT 导出为 CSV(内存不足)

转载 作者:太空宇宙 更新时间:2023-11-04 10:08:04 26 4
gpt4 key购买 nike

我正在尝试从 Netezza 导出一个大文件(使用 Netezza ODBC + pyodbc),此解决方案会引发 memoryError,如果我在没有“列表”的情况下循环,它会非常慢。您是否有什么中间解决方案不会终止我的服务器/python 进程但可以运行得更快?

cursorNZ.execute(sql)
archi = open("c:\test.csv", "w")
lista = list(cursorNZ.fetchall())
for fila in lista:
registro = ''
for campo in fila:
campo = str(campo)
registro = registro+str(campo)+";"
registro = registro[:-1]
registro = registro.replace('None','NULL')
registro = registro.replace("'NULL'","NULL")
archi.write(registro+"\n")

---- 编辑----

谢谢,我正在尝试这个:其中“sql”是查询,cursorNZ 是

connMy = pyodbc.connect(DRIVER=.....)
cursorNZ = connNZ.cursor()

chunk = 10 ** 5 # tweak this
chunks = pandas.read_sql(sql, cursorNZ, chunksize=chunk)
with open('C:/test.csv', 'a') as output:
for n, df in enumerate(chunks):
write_header = n == 0
df.to_csv(output, sep=';', header=write_header, na_rep='NULL')

有这个:AttributeError: 'pyodbc.Cursor' 对象没有属性 'cursor'有什么想法吗?

最佳答案

不要使用 cursorNZ.fetchall()

相反,直接遍历游标:

with open("c:/test.csv", "w") as archi:  # note the fixed '/'
cursorNZ.execute(sql)
for fila in cursorNZ:
registro = ''
for campo in fila:
campo = str(campo)
registro = registro+str(campo)+";"
registro = registro[:-1]
registro = registro.replace('None','NULL')
registro = registro.replace("'NULL'","NULL")
archi.write(registro+"\n")

就我个人而言,我只会使用 pandas:

import pyodbc
import pandas

cnn = pyodbc.connect(DRIVER=.....)
chunksize = 10 ** 5 # tweak this
chunks = pandas.read_sql(sql, cnn, chunksize=chunksize)

with open('C:/test.csv', 'a') as output:
for n, df in enumerate(chunks):
write_header = n == 0
df.to_csv(output, sep=';', header=write_header, na_rep='NULL')

关于python - 使用 Python 将 2Gb+ SELECT 导出为 CSV(内存不足),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39878329/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com