gpt4 book ai didi

SQL 查找具有下一个最佳时间戳匹配的行对

转载 作者:行者123 更新时间:2023-12-03 17:08:16 27 4
gpt4 key购买 nike

我的挑战是找到按时间戳相邻的行对,并仅保留值字段距离最小的那些对(差异的正值)
一张 table measurement从具有时间戳和值的不同传感器收集数据。

id | sensor_id | timestamp | value
---+-----------+-----------+------
1 | 1 | 12:00:00 | 5
2 | 2 | 12:01:00 | 6
3 | 1 | 12:02:00 | 4
4 | 2 | 12:02:00 | 7
5 | 2 | 12:03:00 | 3
6 | 1 | 12:05:00 | 3
7 | 2 | 12:06:00 | 4
8 | 2 | 12:07:00 | 5
9 | 1 | 12:08:00 | 6
传感器的值从其时间戳一直有效,直到其下一条记录的时间戳(相同的 sensor_id)。
图示
enter image description here
下面的绿线显示传感器 1(蓝线)和传感器 2(红线)值随时间的距离。
我的目标是
  • 仅组合与时间戳逻辑匹配的 2 个传感器的记录(以获得绿线)
  • 找到距离局部最小值在
  • 12:01:00(在 12:00:00 没有传感器 2 的记录)
  • 12:05:00
  • 12:08:00


  • 真实表位于 PostgreSQL 数据库中,包含 15 个传感器的约 500 万条记录。
    测试数据
    create table measurement (
    id serial,
    sensor_id integer,
    timestamp timestamp,
    value integer)
    ;

    insert into measurement (sensor_id, timestamp, value)
    values
    (1, '2020-08-16 12:00:00', 5),
    (2, '2020-08-16 12:01:00', 6),
    (1, '2020-08-16 12:02:00', 4),
    (2, '2020-08-16 12:02:00', 7),
    (2, '2020-08-16 12:03:00', 3),
    (1, '2020-08-16 12:05:00', 3),
    (2, '2020-08-16 12:06:00', 4),
    (2, '2020-08-16 12:07:00', 5),
    (1, '2020-08-16 12:08:00', 6)
    ;
    我的方法
    是选择 2 个任意传感器(通过某些传感器 ID),进行自连接并为任何传感器 1 的记录仅保留具有前一个时间戳的传感器 2 的记录(具有传感器 1 时间戳的传感器 2 的最大时间戳 <= 传感器 2 的时间戳) .
    select
    *
    from (
    select
    *,
    row_number() over (partition by m1.timestamp order by m2.timestamp desc) rownum
    from measurement m1
    join measurement m2
    on m1.sensor_id <> m2.sensor_id
    and m1.timestamp >= m2.timestamp
    --arbitrarily sensor_ids 1 and 2
    where m1.sensor_id = 1
    and m2.sensor_id = 2
    ) foo
    where rownum = 1

    union --vice versa

    select
    *
    from (
    select
    *,
    row_number() over (partition by m2.timestamp order by m1.timestamp desc) rownum
    from measurement m1
    join measurement m2
    on m1.sensor_id <> m2.sensor_id
    and m1.timestamp <= m2.timestamp
    --arbitrarily sensor_ids 1 and 2
    where m1.sensor_id = 1
    and m2.sensor_id = 2
    ) foo
    where rownum = 1
    ;
    但这会返回一对 12:00:00传感器 2 没有数据的地方(不是大问题)
    并且在真实表上,语句执行不会在数小时后结束(大问题)。
    我发现了某些类似的问题,但它们与我的问题不符
  • SQL Join on Nearest less than date
  • SQL Join same table based on time stamp and inventory level

  • 提前致谢!

    最佳答案

    第一步是计算每个时间戳的差异。一种方法使用横向连接和条件聚合:

    select t.timestamp,
    max(m.value) filter (where s.sensor_id = 1) as value_1,
    max(m.value) filter (where s.sensor_id = 2) as value_2,
    abs(max(m.value) filter (where s.sensor_id = 2) -
    max(m.value) filter (where s.sensor_id = 1)
    ) as diff
    from (values (1), (2)) s(sensor_id) cross join
    (select distinct timestamp
    from measurement
    where sensor_id in (1, 2)
    ) t left join lateral
    (select m.value
    from measurement m
    where m.sensor_id = s.sensor_id and
    m.timestamp <= t.timestamp
    order by m.timestamp desc
    limit 1
    ) m
    on 1=1
    group by timestamp;
    现在的问题是差异何时进入局部最小值。对于您的样本数据,局部最小值都是一个时间单位。这意味着您可以使用 lag()lead()找到他们:
    with t as (
    select t.timestamp,
    max(m.value) filter (where s.sensor_id = 1) as value_1,
    max(m.value) filter (where s.sensor_id = 2) as value_2,
    abs(max(m.value) filter (where s.sensor_id = 2) -
    max(m.value) filter (where s.sensor_id = 1)
    ) as diff
    from (values (1), (2)) s(sensor_id) cross join
    (select distinct timestamp
    from measurement
    where sensor_id in (1, 2)
    ) t left join lateral
    (select m.value
    from measurement m
    where m.sensor_id = s.sensor_id and
    m.timestamp <= t.timestamp
    order by m.timestamp desc
    limit 1
    ) m
    on 1=1
    group by timestamp
    )
    select *
    from (select t.*,
    lag(diff) over (order by timestamp) as prev_diff,
    lead(diff) over (order by timestamp) as next_diff
    from t
    ) t
    where (diff < prev_diff or prev_diff is null) and
    (diff < next_diff or next_diff is null);
    这可能不是一个合理的假设。因此,在应用此逻辑之前过滤掉相邻的重复值:
    select *
    from (select t.*,
    lag(diff) over (order by timestamp) as prev_diff,
    lead(diff) over (order by timestamp) as next_diff
    from (select t.*, lag(diff) over (order by timestamp) as test_for_dup
    from t
    ) t
    where test_for_dup is distinct from diff
    ) t
    where (diff < prev_diff or prev_diff is null) and
    (diff < next_diff or next_diff is null)
    Here是一个db<> fiddle 。

    关于SQL 查找具有下一个最佳时间戳匹配的行对,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63438715/

    27 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com