gpt4 book ai didi

python - 优化python中的mysql查询

转载 作者:行者123 更新时间:2023-11-29 16:08:22 26 4
gpt4 key购买 nike

我正在使用 python 中的 mysql-connector 连接到 mysql 数据库。在正常的事件运行中,我不需要处理超过 1000 行,但有时我可能需要处理 20k 多行数据,这就是我的查询变得非常慢并超时的地方。我尝试过在 mysql-connector 中使用迭代器来限制在给定时间处理的数量,但似乎这只是为 py 中的 MySQLdb 实现的。

SELECT
ROUND(311.30004 - (20.110938 * temp.value)
- (2.012626 * hum.value) - (2.006346 * airspeed.value)
+ (0.4059 * POWER(temp.value, 2) + (0.006604 * POWER(hum.value, 2))
+ (3.151145 * POWER(airspeed.value, 2) + (0.05555 * temp.value * hum.value)
- (0.37037 * temp.value * airspeed.value) + (0.03968 * hum.value * airspeed.value))), 2)
AS value, temp.time, temp.x, temp.y, temp.z, temp.round_id, 'predicted_frequency', temp.round_number, temp.day_of_production
FROM round_data_temperature AS temp
INNER JOIN round_data_humidity AS hum
ON temp.round_number = hum.round_number
AND temp.x = hum.x
AND temp.y = hum.y AND DATE_FORMAT(temp.time, '%Y-%m-%d %H:%i:00') = DATE_FORMAT(hum.time, '%Y-%m-%d %H:%i:00')
INNER JOIN round_data_airspeed AS airspeed
ON temp.round_number = airspeed.round_number
AND temp.x = airspeed.x AND temp.y = airspeed.y
AND DATE_FORMAT(temp.time, '%Y-%m-%d %H:%i:00') = DATE_FORMAT(airspeed.time, '%Y-%m-%d %H:%i:00')
WHERE temp.round_id ='xxxxx'

这段代码仅在 mysql 上运行得非常快,但在 python 中的 mysql-connector 中运行速度非常慢并且超时,有 20k 多行,所以我决定立即插入以避免在 python 中选择非常大的数据包。我做了以下操作:

INSERT INTO round_data_pf (value, time, x, y, z, round_id, observable_name, round_number, day_of_production)
SELECT
ROUND(311.30004 - (20.110938 * temp.value)
- (2.012626 * hum.value) - (2.006346 * airspeed.value)
+ (0.4059 * POWER(temp.value, 2) + (0.006604 * POWER(hum.value, 2))
+ (3.151145 * POWER(airspeed.value, 2) + (0.05555 * temp.value * hum.value)
- (0.37037 * temp.value * airspeed.value) + (0.03968 * hum.value * airspeed.value))), 2)
AS value, temp.time, temp.x, temp.y, temp.z, temp.round_id, 'pf', temp.round_number, temp.day_of_production
FROM round_data_temperature AS temp
INNER JOIN round_data_humidity AS hum
ON temp.round_number = hum.round_number
AND temp.x = hum.x
AND temp.y = hum.y
AND DATE_FORMAT(temp.time, '%Y-%m-%d %H:%i:00') = DATE_FORMAT(hum.time, '%Y-%m-%d %H:%i:00')
INNER JOIN round_data_airspeed AS airspeed
ON temp.round_number = airspeed.round_number
AND temp.x = airspeed.x
AND temp.y = airspeed.y
AND DATE_FORMAT(temp.time, '%Y-%m-%d %H:%i:00') = DATE_FORMAT(airspeed.time, '%Y-%m-%d %H:%i:00')
WHERE temp.round_id ='xxxxx'

当我有大约 20k 行数据时,这段代码在 python 中的 mysql 和 mysql-connector 中都会超时。

我正在寻求优化代码,以便当我有超过 20k 行时,它可以更快地选择和插入。

最佳答案

确保您有正确的索引

table round_data_temperature composite index  on columns (round_id, round_number, x, y,  time)

table round_data_humidity composite index on columns (round_number, x, y time)

table round_data_airspeed comoosite index on clumns (round_number, x, y,time )

您必须考虑这样一个事实:关系中涉及的每个表只能使用一个索引。如果您没有在表上建立索引,则表将被完全扫描。

单列索引和复合索引之间的区别在于,索引中存在的所有列(从左到右)都用于过滤查询,避免访问表数据,因此在某些情况下使用复合索引(正确定义)可以显着改进查询..

关于python - 优化python中的mysql查询,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55516134/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com