gpt4 book ai didi

python - 使用 CSV 文件加速 SQLite3 UPDATE 功能

转载 作者:行者123 更新时间:2023-11-29 13:25:19 25 4
gpt4 key购买 nike

由于 SQLite 3.7.13 更新 JOIN 上的列的限制,我创建了以下 Python 脚本来帮助解决问题。尽管由于我正在处理的数据量很大,但我遇到了系统资源问题,并且更新方法花费的时间太长。

我有一个 SQLite3 表,其中包含具有以下架构的 7,159,587 条记录:

dist_trpwrenc (id integer primary key autoincrement, IP TEXT, VNE_INTERNAL TEXT, VNE_ENTERPRISE TEXT, VNE_EXTERNAL TEXT)

我有一个 CSV 文件,其中包含 9,224,812 条重复记录。以下是 CSV 文件中的数据示例:

"IP","VNE"
"192.168.1.1","internal"
"192.168.1.1","enterprise"
"192.168.1.1","external"
"192.168.2.1","internal"
"192.168.2.1","external"

Python 脚本正在获取 CSV 文件并根据以下示例更新“dist_trpwrenc”表:

--------------------------------------------------------------
| IP | VNE_INTERNAL | VNE_ENTERPRISE | VNE_EXTERNAL |
| 192.168.1.1 | x | x | x |
| 192.168.2.1 | x | | x |
--------------------------------------------------------------

我正在寻找一种更快的方法来处理更新,这可以通过 SQLite3/Python 实现吗?

#!/usr/bin/python

from openpyxl.reader.excel import load_workbook
import sys, csv, sqlite3, logging, time, os, errno

s = time.strftime('%Y%m%d%H%M%S')

# Create exception file from standard output
class Logger(object):
def __init__(self):
self.terminal = sys.stdout
self.log = open((s)+"_log", "a")

def write(self, message):
self.terminal.write(message)
self.log.write(message)

def dist_trpwrenc_update():
sys.stdout = Logger()

con = sqlite3.connect(sys.argv[1]) # input database name (e.g. database.db) and creates in current working directory.
cur = con.cursor()

try:
with open(sys.argv[2], "rb") as f: # input CSV file
reader = csv.reader(f, delimiter=',')
for row in reader:
try:
ipupdate = (row[4],)

if row[3] == 'internal':
cur.execute('UPDATE dist_trpwrenc SET VNE_INTERNAL="x" WHERE IP=?;', ipupdate)
con.commit()
print row[0], row[4], 'updated:', row[3], ' successfully!'
elif row[3] == 'enterprise':
cur.execute('UPDATE dist_trpwrenc SET VNE_ENTERPRISE="x" WHERE IP=?;', ipupdate)
con.commit()
print row[0], row[4], 'updated:', row[3], ' successfully!'
elif row[3] == 'external':
cur.execute('UPDATE dist_trpwrenc SET VNE_EXTERNAL="x" WHERE IP=?;', ipupdate)
con.commit()
print row[0], row[4], 'updated:', row[3], ' successfully!'
else:
print row[0], row[4], 'did not update:', row[3], ' successfully.'
except (KeyboardInterrupt, SystemExit):
raise
except IOError:
raise

# Close SQLite database connection
con.close()

# Stop logging
sys.stdout = sys.__stdout__

def main():
dist_trpwrenc_update()

if __name__=='__main__':
main()

感谢所有提示,我采用了另一种方法,仅使用 SQL CASE 语句:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import sys, csv, sqlite3, logging, time, os, errno

# Functions
s = time.strftime('%Y%m%d%H%M%S')

# Create file from standard output for database import
class Logger(object):
def __init__(self):
self.terminal = sys.stdout
self.log = open((s) + "_" + sys.argv[1], "a")

def write(self, message):
self.terminal.write(message)
self.log.write(message)

# Function to create CSV from a SQL query.
def sqlExport():
sys.stdout = Logger() # Start screen capture to log file

con = sqlite3.connect(sys.argv[1]) # input database name (e.g. database.db) and creates in current working directory.
cur = con.cursor()

try:
cur.execute('SELECT network, SUM(case when VNE = "V1" then 1 end) as VNECH1, SUM(case when VNE = "V2" then 1 end) as VNECH2, SUM(case when VNE = "V3" then 1 end) as VNECH3 from data_table GROUP by network ORDER BY network;')
data = cur.fetchall()

for row in data:
print '"'+row[0]+'","'+str(row[1])+'","'+str(row[2])+'","'+str(row[3])+'"'

except (KeyboardInterrupt, SystemExit):
raise

con.close()
sys.stdout = sys.__stdout__ # stops capturing data from database export.

# Primary function to execute
def main():
sqlExport()

if __name__=='__main__':
main()

最佳答案

  1. 确保 IP 字段有索引。
  2. 首先将 CSV 中的所有行添加到一个集合中,以消除重复并有望节省数百万次操作。
  3. 删除日志记录功能。您可以假设更新已完成,否则会引发异常,因此日志不会告诉您任何信息。
  4. 尝试降低提交发生的频率 - 尽管在单个事务中执行所有更新可能会遇到问题。

如果这一切还不够,那么如果它适合您的应用程序(例如 CSV 包含 dist_trpwrenc 表中的所有行),则删除现有记录或表并使用 INSERT 查询从 CSV 重新填充它可能会更快比很多更新。

关于python - 使用 CSV 文件加速 SQLite3 UPDATE 功能,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20222472/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com