gpt4 book ai didi

sql - 查询(或算法?)以查找从两个不同位置到一个共享位置的最便宜的重叠航类

转载 作者:行者123 更新时间:2023-12-04 08:56:56 24 4
gpt4 key购买 nike

假设我们有一个数据集,其中包含从洛杉矶到欧洲 80 个城市并返回以及从圣彼得堡到欧洲 80 个城市并返回的 500 000 个航类。我们想找到这样的 4 个航类:

  • 从 LA 到 X 市,从 X 市回到 LA,从 St P 到 X 市,从 X 市回到 St P
  • 所有 4 个航类都必须在 4 天的时间窗口内
  • 我们正在寻找最便宜的 4 个航类的组合价格
  • 城市X可以是80个城市中的任何一个,我们想为所有城市找到这样最便宜的组合并得到这80个组合的列表

  • 数据存储在 BigQuery 中,我已经创建了一个 SQL 查询,但它有 3 个连接,我假设在幕后它可能具有 O(n^4) 的复杂性。 ,因为查询没有在 30 分钟内完成,我不得不中止它。
    这是表的架构:
    enter image description here
    请参阅下面的查询:
    select * from (
    select in_led.`from` as city,
    in_led.price + out_led.price + in_lax.price + out_lax.price as total_price,
    out_led.carrier as out_led_carrier,
    out_led.departure as out_led_departure,
    in_led.departure as in_led_date,
    in_led.carrier as in_led_carrier,
    out_lax.carrier as out_lax_carrier,
    out_lax.departure as out_lax_departure,
    in_lax.departure as in_lax_date,
    in_lax.carrier as in_lax_carrier,
    row_number() over(partition by in_led.`from` order by in_led.price + out_led.price + in_lax.price + out_lax.price) as rn
    from skyscanner.quotes as in_led
    join skyscanner.quotes as out_led on out_led.`to` = in_led.`from`
    join skyscanner.quotes as out_lax on out_lax.`to` = in_led.`from`
    join skyscanner.quotes as in_lax on in_lax.`from` = in_led.`from`
    where in_led.`to` = "LED"
    and out_led.`from` = "LED"
    and in_lax.`to` in ("LAX", "LAXA")
    and out_lax.`from` in ("LAX", "LAXA")
    and DATE_DIFF(DATE(in_led.departure), DATE(out_led.departure), DAY) < 4
    and DATE_DIFF(DATE(in_led.departure), DATE(out_led.departure), DAY) > 0
    and DATE_DIFF(DATE(in_lax.departure), DATE(out_lax.departure), DAY) < 4
    and DATE_DIFF(DATE(in_lax.departure), DATE(out_lax.departure), DAY) > 0
    order by total_price
    )
    where rn=1
    更多详情:
  • 所有航类的出发日期都在 120 天的窗口内

  • 问题:
  • 有没有办法优化这个查询以获得更好的性能?
  • 如何正确分类这个问题?蛮力解决方案太慢了,但我看不出这是什么类型的问题。当然看起来不像图表,有点像使用稳定排序按不同字段对表格进行几次排序可能会有所帮助,但似乎仍然不是最佳选择。
  • 最佳答案

    下面是 BigQuery 标准 SQL

    The brute force solution is way too slow, but I'm failing to see what type of problem this is.so I would like to see solutions other than brute force if anyone here has ideas

    #standardSQL
    WITH temp AS (
    SELECT DISTINCT *, UNIX_DATE(DATE(departure)) AS dep FROM `skyscanner.quotes`
    ), round_trips AS (
    SELECT t1.from, t1.to, t2.to AS back, t1.price, t1.departure, t1.dep first_day, t1.carrier, t2.departure AS departure2, t2.dep AS last_day, t2.price AS price2, t2.carrier AS carrier2,
    FROM temp t1
    JOIN temp t2
    ON t1.to = t2.from
    AND t1.from = t2.to
    AND t2.dep BETWEEN t1.dep + 1 AND t1.dep + 3
    WHERE t1.from IN ('LAX', 'LED')
    )
    SELECT cityX, total_price,
    ( SELECT COUNT(1)
    FROM UNNEST(GENERATE_ARRAY(t1.first_day, t1.last_day)) day
    JOIN UNNEST(GENERATE_ARRAY(t2.first_day, t2.last_day)) day
    USING(day)
    ) overlap_days_in_cityX,
    (SELECT AS STRUCT departure, price, carrier, departure2, price2, carrier2
    FROM UNNEST([t1])) AS LAX_CityX_LAX,
    (SELECT AS STRUCT departure, price, carrier, departure2, price2, carrier2
    FROM UNNEST([t2])) AS LED_CityX_LED
    FROM (
    SELECT AS VALUE ARRAY_AGG(t ORDER BY total_price LIMIT 1)[OFFSET(0)]
    FROM (
    SELECT t1.to cityX, t1.price + t1.price2 + t2.price + t2.price2 AS total_price, t1, t2
    FROM round_trips t1
    JOIN round_trips t2
    ON t1.to = t2.to
    AND t1.from < t2.from
    AND t1.departure2 > t2.departure
    AND t1.departure < t2.departure2
    ) t
    GROUP BY cityX
    )
    ORDER BY overlap_days_in_cityX DESC, total_price
    带输出(仅 60 行中的前 10 行)
    enter image description here
    简要说明:
  • temp CTE:重复数据删除并引入 dep字段 - 自纪元以来消除昂贵的 TIMESTAMP 函数的天数
  • round_trips CTE:识别最多相隔 4 天的所有往返候选者
  • 识别那些有重叠的 LAX 和 LED 往返行程
  • 每个城市X取最便宜的组合
  • 最终输出对 cityX 中的重叠天数进行额外计算,并略微输出以获取所有涉及航类的信息

  • 注意:在您的数据中 - 持续时间字段全为零 - 所以不涉及 - 但如果你有它 - 很容易将其添加到逻辑中

    关于sql - 查询(或算法?)以查找从两个不同位置到一个共享位置的最便宜的重叠航类,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63771076/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com