gpt4 book ai didi

python - 如何加快包含 sql 查询的代码?

转载 作者:行者123 更新时间:2023-12-04 09:46:09 26 4
gpt4 key购买 nike

我的数据框包含大约 65万独特的行。对于每一行,我需要从数据库中获取一个值。我使用了一个 for 循环,但执行时间是灾难性的,大约是 25小时 .如何加快代码执行速度?我假设您需要使用 joblib 或 numba 来并行化执行。但是循环体中存在一个 sql 查询令人困惑。

for x in tqdm_notebook(range(len(table))):
good = table.iloc[x, 0]
store = table.iloc[x, 1]
start = table.iloc[x, 6]

query = f"""
SELECT
good_id,
store_id,
AVG(sale) AS avg_sale,
SUM(sale) AS sum_sale,
MAX(sale) AS max_sale,
MIN(sale) AS min_sale
FROM my_table
WHERE good_id = {good}
AND store_id = {store}
AND date_id BETWEEN DATEADD(MONTH, -2, '{start}') AND DATEADD(MONTH, -1, '{start}')
GROUP BY good_id, store_id
"""
temp = pd.read_sql(query, connection)
if not temp.empty:
table.iloc[x, 13] = temp['avg_sale'].values
table.iloc[x, 14] = temp['sum_sale'].values
table.iloc[x, 15] = temp['max_sale'].values
table.iloc[x, 16] = temp['min_sale'].values

最佳答案

为此,您可能只是扩大查询范围并获取所有 (store,good,day) 元组,为它们获取部分聚合,并在 Pandas 中执行最终过滤和聚合。注意你改了AVGCOUNT(*)并计算最终聚合中的 AVG。

将参数列表传递给 SQL Server 的一种简便方法是使用 OPENJSON .只需发送一个带有标量的 JSON 数组的字符串参数,例如

'[123,324,445,23,1322]'

或者
'["abd","def","d"]'

所以像
SELECT 
good_id,
store_id,
date_id,
count(*) AS count_sale.
SUM(sale) AS sum_sale,
MAX(sale) AS max_sale,
MIN(sale) AS min_sale
FROM my_table
WHERE good_id in (select cast(value as int) from openjson(?))
AND store_id in (select cast(value as int) from openjson(?))
AND date_id BETWEEN ? and ?
GROUP BY good_id, store_id, date_id
ORDER BY good_id, store_id, date_id

关于python - 如何加快包含 sql 查询的代码?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62102584/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com