gpt4 book ai didi

python - 如何针对 300 万条记录优化 python postgresql 查询

转载 作者:行者123 更新时间:2023-11-29 12:08:54 25 4
gpt4 key购买 nike

我正在使用 python 脚本在 postgresql 中插入或更新大约 3 到 4 百万个数据。请看下面的代码。要求是如果其新键插入,或者如果键已经存在则用新值更新键。但是下面的代码与数据库的往返连接过多,并且需要大约 35-45 分钟才能将 300 万条记录插入数据库,这非常慢。如何避免往返连接并以更快的方式插入或更新?

任何帮助将不胜感激。

提前感谢您的帮助。

InputFile.txt - 这个文件有大约 3 到 4 百万行项目

productKey1 printer1,printerModel1,printerPrice1,printerDesc1|
productKey2 sacnner2,scannerModel2,scannerPrice2,scannerDesc2|
productKey3 mobile3,mobileModel3,mobilePrice3,mobileDesc3|
productKey4 tv4,tvModel4,tvPrice4,tvDescription4|
productKey2 sacnner22,scannerModel22,scannerPrice22,scannerDesc22|

插入.py

def insertProduct(filename, conn):
seen = set()
cursor = conn.cursor()
qi = "INSERT INTO productTable (key, value) VALUES (%s, %s);"
qu = "UPDATE productTable SET value = CONCAT(value, %s) WHERE key = %s;"

with open(filename) as f:
for line in f:
if line.strip():
key, value = line.split(' ', 1)
if key not in seen:
seen.add(key)
cursor.execute(qi, (key, value))
else:
cursor.execute(qu, (value, key))

conn.commit()

conn = psycopg2.connect("dbname='productDB' user='myuser' host='localhost'")
insertProduct('InputFile.txt', conn)

最佳答案

执行成批的准备好的语句。 http://initd.org/psycopg/docs/extras.html#fast-execution-helpers

import psycopg2, psycopg2.extras
def insertProduct(filename, conn):

data = []
with open(filename) as f:
for line in f:
line = line.strip()
if line:
key, value = line.split(' ', 1)
data.append((key, value))

cursor = conn.cursor()
cursor.execute("""
prepare upsert (text, text) as
with i as (
insert into productTable (key, value)
select $1, $2
where not exists (select 1 from productTable where key = $1)
returning *
)
update productTable p
set value = concat (p.value, $2)
where p.key = $1 and not exists (select 1 from i)
""")
psycopg2.extras.execute_batch(cursor, "execute upsert (%s, %s)", data, page_size=500)
cursor.execute("deallocate upsert")
conn.commit()

conn = psycopg2.connect(database='cpn')
insertProduct('InputFile.txt', conn)

关于python - 如何针对 300 万条记录优化 python postgresql 查询,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47165342/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com