gpt4 book ai didi

python - 并行化 pandas pyodbc SQL 数据库调用

转载 作者:太空狗 更新时间:2023-10-29 18:09:23 25 4
gpt4 key购买 nike

我目前正在通过 pandas.io.sql.read_sql() 命令将数据查询到数据框中。我想并行化类似于这些人所提倡的调用:( Embarrassingly parallel database calls with Python (PyData Paris 2015 ) )

类似(非常笼统):

pools = [ThreadedConnectionPool(1,20,dsn=d) for d in dsns]
connections = [pool.getconn() for pool in pools]
parallel_connection = ParallelConnection(connections)
pandas_cursor = parallel_connection.cursor()
pandas_cursor.execute(my_query)

这样的事情可能吗?

最佳答案

是的,这应该有效,但需要注意的是您需要在您站点的那个演讲中更改 parallel_connection.py。在该代码中有一个 fetchall 函数,它并行执行每个游标,然后组合结果。这是您要更改的核心内容:

旧代码:

def fetchall(self):
results = [None] * len(self.cursors)
def do_work(index, cursor):
results[index] = cursor.fetchall()
self._do_parallel(do_work)
return list(chain(*[rs for rs in results]))

新代码:

def fetchall(self):
results = [None] * len(self.sql_connections)
def do_work(index, sql_connection):
sql, conn = sql_connection # Store tuple of sql/conn instead of cursor
results[index] = pd.read_sql(sql, conn)
self._do_parallel(do_work)
return pd.DataFrame().append([rs for rs in results])

repo :https://github.com/godatadriven/ParallelConnection

关于python - 并行化 pandas pyodbc SQL 数据库调用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32136276/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com