gpt4 book ai didi

postgresql,获取条件为真的最近点的项目列表

转载 作者:行者123 更新时间:2023-11-29 13:51:36 25 4
gpt4 key购买 nike

编辑:跳到当前状态的最后编辑

您好!

我有一张带有气象站的 table

车站:

id,
point, (geometry(Point,4326))
ctry (country code)

还有一个包含天气数据的表格:

诺亚:

id                 | integer                     | not null    default    nextval('noaa_id_seq'::regclass)
usaf_wban | text |
station_id | integer |
usaf | integer |
wban | integer |
dt | timestamp without time zone | not null
point | geometry(Point,4326) |
air_temp | double precision |
dew_point | double precision |
relative_humidity | double precision |
sea_level_pressure | double precision |
pressure | double precision |
wind | double precision |
cloudiness | double precision |
ghi | double precision |

还有另一个 locations_location 我明白了这一点

我对索引进行了很多试验,目前 noaa 表上的索引是:

Indexes:
"noaa_pkey" PRIMARY KEY, btree (id)
"noaa_dt_trunc" btree (date_trunc('hour'::text, dt))
"noaa_point" gist (point)
"noaa_station_ids" btree (station_id)

现在我想为每个参数选择(air_temp,wind ..)此参数不为空且不为 9999 的最近点

此刻我使用了 5 个看起来像这样的单一查询:

 with postal_station AS (
SELECT id as station_id, s.point FROM stations s WHERE s.ctry = 'AU'
ORDER BY s.point <-> (
SELECT point FROM locations_location l
WHERE l.postal_code = '9201' AND l.country_code = 'AT'
LIMIT 1
)
LIMIT 5
)
SELECT
DISTINCT ON (date_trunc('hour', dt))
date_trunc('hour', dt) as dt,
cloudiness
FROM
noaa n
WHERE
dt BETWEEN '2010-01-01'::timestamp AND '2015-01-01'::timestamp
AND
NOT cloudiness = 9999
AND
NOT cloudiness is null
AND
n.station_id IN (SELECT station_id FROM postal_station)
ORDER BY dt, point <-> ( SELECT point FROM postal_station LIMIT 1 )

这非常快~150 毫秒,唯一使用的索引是 noaa_station_ids

但目前我将 station_ids 的限制增加了大约 5 :

with postal_station AS (
SELECT id as station_id, s.point FROM stations s WHERE s.ctry = 'AU'
ORDER BY s.point <-> (
SELECT point FROM locations_location l
WHERE l.postal_code = '9201' AND l.country_code = 'AT'
LIMIT 1
)
LIMIT 6
)
SELECT
DISTINCT ON (date_trunc('hour', dt))
date_trunc('hour', dt) as dt,
air_temp
FROM
noaa n
WHERE
dt BETWEEN '2010-01-01'::timestamp AND '2015-01-01'::timestamp
AND
NOT air_temp = 9999
AND
NOT air_temp is null
AND
n.station_id IN (SELECT station_id FROM postal_station)
ORDER BY dt, point <-> ( SELECT point FROM postal_station LIMIT 1 )

https://explain.depesz.com/s/9n2M

索引 noaa_station_ids 不再被使用,查询大约需要 ~2429ms

所以这是我的问题:

  • 如果“n.station_id IN”子句包含超过 5 个值,为什么不使用索引 noaa_station_ids?

  • 有没有办法在合理的时间内在一个查询中选择所有需要的值?

感谢您的阅读:)

PS: 启用 postgis 的 Postgres 9.5

编辑:实际上,cte 应该看起来像这样以获得正确的订购点..但这是一个侧面

with postal_point AS (
SELECT point FROM locations_location l
WHERE l.postal_code = '9201' AND l.country_code = 'AT'
LIMIT 1
),
postal_station AS (
SELECT id as station_id, s.point FROM stations s WHERE s.ctry = 'AU'
ORDER BY s.point <-> ( SELECT point FROM postal_point )
LIMIT 5
)

编辑:在 freenode RhodiumToad 上加入 #postgresql 后帮助我构建了这个查询

with postal_station AS (
select
s1.*
from (
select point from locations_location l where l.postal_code = '9201' AND l.country_code = 'AT' limit 1
) l0,
lateral (
select s.id, rank() over (order by s.point <-> l0.point)
from
stations s
where
s.ctry = 'AU'
order by s.point <-> l0.point limit 20) s1
)
SELECT
DISTINCT ON (date_trunc('hour', dt))
date_trunc('hour', dt) as dt,
air_temp
FROM
noaa n
JOIN
postal_station p
ON
p.id = n.station_id
WHERE
dt BETWEEN '2010-01-01'::timestamp AND '2015-01-01'::timestamp
AND
NOT air_temp = 9999
AND
NOT air_temp is null
ORDER BY dt, p.rank

即使有更多站也很快~200ms => https://explain.depesz.com/s/kA8

我会在几天内将此帖子标记为已回答。

仍然欢迎优化。

最佳答案

1) Why is the index noaa_station_ids not used if the "n.station_id IN" clause contains more then 5 values ?

2) Is there a way to select all needed values in one query in reasonable time ?

1) 将 cpu_tuple_cost 增加到 0.1 后,索引也用于更多的站点,但随着站点数量的增加,查询仍然变慢

2) atm 我使用 5 个查询并立即发送它们以获取所有需要的数据,连同上次编辑中的查询,查询时间还可以。

查询:

关键是对 cte 中的站点进行排名,然后加入 cte。这种方式排序要快得多。

关于postgresql,获取条件为真的最近点的项目列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40074097/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com