gpt4 book ai didi

google-bigquery - 考虑并发使用,在用户之间分配资源成本

转载 作者:行者123 更新时间:2023-12-01 15:36:50 27 4
gpt4 key购买 nike

问题

我有以下场景:在确定的时间段内可用的给定资源在该时间内花费固定金额。我的用户可以在那段时间访问该资源。我需要在访问它的用户之间分配资源成本,考虑到用户在不访问它的期间不能被收费。像这样的事情:

example 1

红色条代表资源在一段时间内的可用性。蓝色和绿色条代表相应用户访问资源的时间。请注意,在时间 9 时,没有人访问该资源,因此没有人付费。考虑到整个时间内的资源成本 $100,用户 1 将支付 $40 费用,用户 2 将支付 $50 费用。 $10 将丢失。

这种情况的想法很简单:获取资源的全部成本并除以每个用户使用的时间。但是当我们并发使用资源时,问题就会出现:

example 2

在本例中,在时间 4 和 5,两个用户都使用相同的资源。在这种情况下,我需要将重叠时间的成本除以 2(并发用户数),得到正确的值。

换句话说:使用该资源的用户越多,每个用户的费用就越便宜。

当然,问题可能会变得更加复杂,如下所示:

example 3

数据

目前我有一个具有以下结构的表(使用示例 3):

+---------------------------------------------------------------------------
|ResourceId |UserId |UsageStart |UsageEnd |ResourceTotalCost |WeightedCost |
+--------------------------------------------------------------------------+
|res1 |u1 |time 0 |time 1 |100 |20 |
|res1 |u1 |time 4 |time 7 |100 |40 |
|res1 |u2 |time 4 |time 8 |100 |50 |
|res1 |u3 |time 1 |time 4 |100 |40 |
|res1 |u3 |time 8 |time 8 |100 |10 |
|---------------------------------------------------------------------------

我有每个用户的确切使用时间段,加上资源的总成本(整个分析期间),以及用户的资源加权成本(这是我想要的列)改进)。

UsageStartUsageEnd 列是时间戳,并且具有毫秒精度(这意味着时间可以彼此相隔 1 毫秒)。 ResourceIdUserId 是没有模式的字符串(但保证对于每个资源和用户分别是唯一的)。 ResourceTotalCostWeightedCost 都是 float 。

输出

我需要的输出与我已经拥有的输出相同,但加权成本考虑了用户之间资源的并发使用。对于示例 3,这是预期的输出:

+---------------------------------------------------------------------------
|ResourceId |UserId |UsageStart |UsageEnd |ResourceTotalCost |WeightedCost |
+--------------------------------------------------------------------------+
|res1 |u1 |time 0 |time 1 |100 |15 |
|res1 |u1 |time 4 |time 7 |100 |18.33 |
|res1 |u2 |time 4 |time 8 |100 |23.33 |
|res1 |u3 |time 1 |time 4 |100 |28.33 |
|res1 |u3 |time 8 |time 8 |100 |5 |
|---------------------------------------------------------------------------

那么,有什么想法可以解决这个问题吗?我考虑过使用UDF来解决这个问题,但目前 UDF 的使用施加了一些限制(例如同时 6 个 UDF 查询),这是我的项目无法承受的,因此纯 BigQuery SQL 更可取。

谢谢。

最佳答案

尝试以下操作 - 适用于 BigQuery 标准 SQL(请参阅 Enabling Standard SQLMigrating from legacy SQL)

如您所见 - 我稍微调整了您的数据(UsageStart和UsageEnd是整数)
我涉及了您的所有三个示例,ResourceId 分别标记为 res1、res2 和 res3
此外,我还为每个资源添加了额外的条目,以显示资源的可用性。这些条目的 UserId 为 NULL

所以查询是

WITH Usage AS (
SELECT ResourceId, UserId, UsageStart, UsageEnd + 1 AS UsageEnd, ResourceTotalCost
FROM (SELECT 'res3' AS ResourceId, 'u1' AS UserId, 0 AS UsageStart, 1 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res3' AS ResourceId, 'u1' AS UserId, 4 AS UsageStart, 7 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res3' AS ResourceId, 'u2' AS UserId, 4 AS UsageStart, 8 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res3' AS ResourceId, 'u3' AS UserId, 1 AS UsageStart, 4 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res3' AS ResourceId, 'u3' AS UserId, 8 AS UsageStart, 8 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res3' AS ResourceId, NULL AS UserId, 0 AS UsageStart, 9 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res1' AS ResourceId, 'u1' AS UserId, 0 AS UsageStart, 3 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res1' AS ResourceId, 'u2' AS UserId, 4 AS UsageStart, 8 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res1' AS ResourceId, NULL AS UserId, 0 AS UsageStart, 9 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res2' AS ResourceId, 'u1' AS UserId, 0 AS UsageStart, 5 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res2' AS ResourceId, 'u2' AS UserId, 4 AS UsageStart, 8 AS UsageEnd, 100 AS ResourceTotalCost UNION ALL
SELECT 'res2' AS ResourceId, NULL AS UserId, 0 AS UsageStart, 9 AS UsageEnd, 100 AS ResourceTotalCost )
), iIntervals AS (
SELECT ResourceId, iStart, LEAD(iStart)
OVER(PARTITION BY ResourceId ORDER BY iStart) AS iEnd
FROM (
SELECT DISTINCT ResourceId, iStart FROM (
SELECT ResourceId, UsageStart AS iStart FROM Usage UNION ALL
SELECT ResourceId, UsageEnd AS iStart FROM Usage )
)
), iWeights AS (
SELECT iStart, iEnd, x.ResourceId, UserId, ResourceTotalCost,
SUM(iWeight / CASE WHEN Users = 0 THEN 1 ELSE Users END / width) AS iWeight
FROM (
SELECT iStart, iEnd, iEnd - iStart AS iWeight, iIntervals.ResourceId, UserId, ResourceTotalCost,
COUNT(DISTINCT UserId) OVER(PARTITION BY iIntervals.ResourceId, iStart, iEnd) AS Users
FROM iIntervals JOIN Usage
ON iIntervals.ResourceId = Usage.ResourceId
AND iStart >= UsageStart AND iEnd <= UsageEnd
WHERE iEnd IS NOT NULL ) AS x
JOIN (SELECT ResourceId, MAX(iEnd) - MIN(iStart) AS width FROM iIntervals GROUP BY 1) AS y
ON x.ResourceId = y.ResourceId WHERE NOT (UserId IS NULL AND Users > 0) GROUP BY 1, 2, 3, 4, 5
)
SELECT usage.ResourceId, usage.UserId, usage.UsageStart, usage.UsageEnd - 1 as UsageEnd,
iWeights.ResourceTotalCost, ROUND(SUM(iWeights.ResourceTotalCost * iWeight), 2) AS WeightedCost
FROM Usage JOIN iWeights
ON usage.ResourceId = iWeights.ResourceId AND usage.UserId = iWeights.UserId
AND iWeights.iStart BETWEEN usage.UsageStart AND usage.UsageEnd
AND iWeights.iEnd BETWEEN usage.UsageStart AND usage.UsageEnd
GROUP BY 1, 2, 3, 4, 5 ORDER BY 1, 2, 3

输出如下,我认为这是预期的

ResourceId  UserId  UsageStart  UsageEnd    ResourceTotalCost   WeightedCost     
res1 u1 0 3 100 40.0
res1 u2 4 8 100 50.0
res2 u1 0 5 100 50.0
res2 u2 4 8 100 40.0
res3 u1 0 1 100 15.0
res3 u1 4 7 100 18.33
res3 u2 4 8 100 23.33
res3 u3 1 4 100 28.33
res3 u3 8 8 100 5.0

很可能您的UsageStart和UsageEnd不是整数,因此您需要分别调整上述解决方案。但我的重点是为您提供可供处理的可靠逻辑示例。

希望这有帮助!而且它也可能会被优化

关于google-bigquery - 考虑并发使用,在用户之间分配资源成本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40178351/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com