gpt4 book ai didi

python - 我如何在 python 中多线程 SQL 查询,以便获得所有查询的结果

转载 作者:太空宇宙 更新时间:2023-11-04 10:09:11 25 4
gpt4 key购买 nike

有没有一种方法可以使用线程同时执行 SQL 查询,这样我就可以减少下面代码的处理时间?有没有更好的方法在不使用 pandas 模块的情况下执行与下面相同的结果?考虑到我正在使用的数据集的大小,我无法将整个数据集存储在内存中,而且我发现循环遍历 SELECT * FROM 语句的行并将它们与我正在查询的列表进行比较,这会增加处理时间。

# DATABASE layout
# _____________________________________________________________
# | id | name | description |
# |_____________|____________________|__________________________|
# | 1 | John | Credit Analyst |
# | 2 | Jane | Doctor |
# | ... | ... | ... |
# | 5000000 | Mohammed | Dentist |
# |_____________|____________________|__________________________|

import sqlite3


SEARCH_IDS = [x for x in range(15000)]
DATABASE_NAME = 'db.db'

def chunks(wholeList, chunkSize=999):
"""Yield successive n-sized chunks from wholeList."""
for i in range(0, len(wholeList), chunkSize):
yield wholeList[i:i + chunkSize]

def search_database_for_matches(listOfIdsToMatch):
'''Takes a list of ids and returns the rows'''
conn = sqlite3.connect(DATABASE_NAME)
cursor = conn.cursor()
sql = "SELECT id, name, description FROM datatable WHERE id IN ({})".format(', '.join(["?" for x in listOfIdsToMatch]))
cursor.execute(sql,tuple(listOfIdsToMatch))
rows = cursor.fetchall()
return rows

def arrange(orderOnList,listToBeOrdered,defaultReturnValue='N/A'):
'''Takes a list of ids in the desired order and list of tuples which have ids as the first items.
the list of tuples is aranged into a new list corresponding to the order of the source list'''
from collections import OrderedDict
resultList=[defaultReturnValue for x in orderOnList]
indexLookUp = OrderedDict( [ ( value , key ) for key , value in enumerate( orderOnList ) ] )
for item in listToBeOrdered:
resultList[indexLookUp[item[0]]]=item
return resultList


def main():
results=[]
for chunk in chunks(SEARCH_IDS,999):
results += search_database_for_matches(chunk)
results = arrange(SEARCH_IDS,results)
print(results)


if __name__ == '__main__': main()

最佳答案

一些建议:

与其使用迭代器通过 chucks 读取记录,不如使用分页。

看这个问题:

如果您使用多线程/多处理,请确保您的数据库可以支持它。请参阅:SQLite And Multiple Threads

要实现你想要的,你可以使用一个工作池来处理每个 block 。参见 Using a pool of workers在 Python 文档中。

例子:

Import multiprocessing 

with multiprocessing.pool.Pool(process = 4) as pool:
result = pool.map(search_database_for_match, [for chunk in chunks(SEARCH_IDS,999)])

关于python - 我如何在 python 中多线程 SQL 查询,以便获得所有查询的结果,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39309714/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com