gpt4 book ai didi

sql - UBER CRM 案例研究 SQL 中的客户保留指标

转载 作者:行者123 更新时间:2023-12-04 20:24:53 24 4
gpt4 key购买 nike

背景:我想在每周滚动,以便他们可以在骑手 28 天未乘车时采取必要的干预措施。

链接:Problem in detail

下面是我试图在单个查询中实现的指标列表

输出查询中列的定义(单个查询):

  1. 日期:将计算以下指标的日期。

  2. city_id:城市id

  3. dau:在该日期至少完成一次行程的不同骑手的数量。

  4. wau:相对于“日期”列中的日期,在过去 7 天内完成至少一次行程的不同乘客数

  5. new_rider:相对于“日期”列中的日期,在过去 7 天内进行过第一次旅行的不同骑手的数量

  6. previous_mau:相对于“日期”列中的日期,在过去 56 到过去 29 天内至少完成一次行程的不同骑手数量

  7. mau_28:根据“日期”列中的日期,在过去 28 天内完成至少一次行程的不同骑手数量

  8. 保留:previous_mau 和 mau_28 阶段不同车手的交集

  9. 复活:在 previous_mau 阶段不活跃但在 mau_28 阶段活跃的不同骑手的数量。

  10. 流失:在 previous_mau 阶段活跃但在mau_28 阶段。

活跃:如果骑手在相应时间段内至少完成了一次行程Inactive :如果骑手在相应时间段内没有进行过单次旅行

下面是我尝试过的:

create table Tripdata
(
[date] date,
rider_id int,
trip_id int,
city_id int,
status varchar(100)
)
go

插入值的查询

INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 348, 1, 8, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1729, 2, 5, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5265, 3, 4, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2098, 4, 4, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 4942, 5, 8, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5424, 6, 11, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 4269, 7, 7, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5649, 8, 1, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2385, 9, 6, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5161, 10, 8, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 571, 11, 8, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5072, 12, 9, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1233, 13, 5, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2490, 14, 5, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5665, 15, 9, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1400, 16, 2, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 3324, 17, 4, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2533, 18, 13, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5314, 19, 11, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 4773, 20, 12, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5544, 21, 2, N'completed')
GO
INSERT [dbo].[Tripdata] ([datee], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1232, 22, 5, N'completed')
GO

以下是我到目前为止所获得/取得的成果:(致谢LukStorms)

   SELECT *
FROM
(
SELECT [datee], city_id,
COUNT(DISTINCT rider_id) AS [dau]
FROM [dbo].[Tripdata]
GROUP BY [datee], city_id
) t
OUTER APPLY
(
SELECT
COUNT(rider_id) AS [wau],
COUNT(CASE WHEN [rides]=1 THEN rider_id END) AS [new_rider]
FROM
(
SELECT t2.city_id, t2.rider_id,
COUNT(*) AS [rides]
FROM [dbo].[Tripdata] t2
WHERE t2.city_id = t.city_id
AND t2.[datee] <= t.[datee]
AND t2.[datee]>=dateadd(day,-7,t.[datee])
GROUP BY t2.city_id, t2.rider_id
) q
GROUP BY city_id
) last7
OUTER APPLY
(
SELECT
COUNT(DISTINCT t2.rider_id) AS [previous_mau]
FROM [dbo].[Tripdata] t2
WHERE t2.city_id = t.city_id
AND t2.[datee] <= dateadd(day,-29,t.[datee])
AND t2.[datee] >= dateadd(day,-56,t.[datee])
) prev29

ORDER BY t.[datee], t.city_id;

如何在一次查询中实现上述查询的所有结果?以及如何编写查询以在单个查询输出中回答上述 7、8、9、10 个问题?

此外,它们很少是将骑手映射到特定城市的特殊考虑因素

一个骑手可能会从多个城市出发,这可能会导致计算活跃的骑手或在多个城市不活跃。因此,为了解决这个问题,需要将骑手映射到只有一个城市。应将骑手映射到他们乘坐的城市仅考虑他们最近的 20 次旅行的最大旅行次数。2. 对于与城市相关的所有计算,重要的是要考虑映射到骑手的城市而不是旅行发生的城市。3.我们的数据库系统没有标准模式功能,所以骑手城市映射需要推导。

最佳答案

下面是我的做法->

SQL Fiddle

MS SQL Server 2017 架构设置:

create table TripData
(
[date] date,
rider_id int,
trip_id int,
city_id int,
status varchar(100)
)
go
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 348, 1, 8, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1729, 2, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5265, 3, 4, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2098, 4, 4, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 4942, 5, 8, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5424, 6, 11, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 4269, 7, 7, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5649, 8, 1, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2385, 9, 6, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5161, 10, 8, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 571, 11, 8, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5072, 12, 9, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1233, 13, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2490, 14, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5665, 15, 9, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1400, 16, 2, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 3324, 17, 4, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 2533, 18, 13, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5314, 19, 11, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 4773, 20, 12, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 5544, 21, 2, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1232, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 111, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-01T00:00:00.000' AS DateTime), 111, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-28T00:00:00.000' AS DateTime), 111, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-03-28T00:00:00.000' AS DateTime), 111, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-01-28T00:00:00.000' AS DateTime), 111, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-15T00:00:00.000' AS DateTime), 222, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 222, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 333, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-01T00:00:00.000' AS DateTime), 333, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-15T00:00:00.000' AS DateTime), 222, 22, 10, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 222, 22, 10, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 333, 22, 10, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-01T00:00:00.000' AS DateTime), 333, 22, 10, N'completed')
GO

INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 3333, 22, 10, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-01T00:00:00.000' AS DateTime), 3333, 22, 10, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-06-01T00:00:00.000' AS DateTime), 1112, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-01T00:00:00.000' AS DateTime), 1112, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-05-28T00:00:00.000' AS DateTime), 1112, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-03-28T00:00:00.000' AS DateTime), 1112, 22, 5, N'completed')
GO
INSERT [dbo].[TripData] ([date], [rider_id], [trip_id], [city_id], [status]) VALUES (CAST(N'2019-01-28T00:00:00.000' AS DateTime), 1112, 22, 5, N'completed')

查询 1:

;WITH AdddedIndicators AS
(
/*For every record, calculate the key metrics needed to aggerate up.
days_back_last_ride can make use of LAG() by rider and city ordered by date or null if no value,
ISNULL the result to bring it back to 0 meaning no days back(first ride).
Days_back_last_ride=0 could be used to determine first ride date, however, that would ot fit the between 1..7 rule
so we need a first_ride_date. Again, using a window function by rider and city, grab the min date*/
SELECT
td.date, rider_id, city_id,
days_back_last_ride = ISNULL(DATEDIFF(DAY,LAG(date) OVER(PARTITION BY rider_id,city_id ORDER BY date),td.date),0),
first_ride_date = MIN(date) OVER (PARTITION BY rider_id, city_id)
FROM
TripData td
)
,Normalized AS
(
/*The need metrics have been calculated above for the bulk of your calcs with the data, query it and
build up, aggregates up, flags for each rider/city/date so we can ultimatley formulate this for each rider/city
since this is by rider, city and date the user will allocate points to every city
vistited in a given day*/
SELECT
date, city_id, rider_id,
dau= COUNT(DISTINCT rider_id),
wau_flag = SUM(CASE WHEN days_back_last_ride BETWEEN 1 AND 7 THEN 1 ELSE 0 END),
new_rider_flag = SUM(CASE WHEN DATEDIFF(DAY,first_ride_date,date) <= 7 THEN 1 ELSE 0 END),
previous_mau_flag = SUM(CASE WHEN days_back_last_ride BETWEEN 29 AND 56 THEN 1 ELSE 0 END),
mau_28_flag = SUM(CASE WHEN days_back_last_ride BETWEEN 1 AND 28 THEN 1 ELSE 0 END),
retained = CASE WHEN SUM(CASE WHEN days_back_last_ride BETWEEN 1 AND 28 THEN 1 ELSE 0 END) > 1
AND
SUM(CASE WHEN days_back_last_ride BETWEEN 29 AND 56 THEN 1 ELSE 0 END) > 1 THEN 1 ELSE 0 END
FROM
AdddedIndicators
GROUP BY
city_id, date, rider_id
)
SELECT
/* Finalize the results by date and city
The flags have been made by user, city and date above.
So gather each data piont and sum them up based on the rule set */
date, city_id,
dau = SUM(dau),
wau_flag = SUM(CASE WHEN wau_flag >= 1 THEN 1 ELSE 0 END),
retained = SUM(CASE WHEN previous_mau_flag >= 1 AND mau_28_flag >= 1 THEN 1 ELSE 0 END),
resurrect = SUM(CASE WHEN previous_mau_flag = 0 AND mau_28_flag >= 1 THEN 1 ELSE 0 END),
churn = SUM(CASE WHEN previous_mau_flag >= 1 AND mau_28_flag = 0 THEN 1 ELSE 0 END)
FROM
Normalized
GROUP BY
date, city_id

Results :

|       date | city_id | dau | wau_flag | retained | resurrect | churn |
|------------|---------|-----|----------|----------|-----------|-------|
| 2019-06-01 | 1 | 1 | 0 | 0 | 0 | 0 |
| 2019-06-01 | 2 | 2 | 0 | 0 | 0 | 0 |
| 2019-06-01 | 4 | 3 | 0 | 0 | 0 | 0 |
| 2019-01-28 | 5 | 2 | 0 | 0 | 0 | 0 |
| 2019-03-28 | 5 | 2 | 0 | 0 | 0 | 0 |
| 2019-05-01 | 5 | 3 | 0 | 0 | 0 | 2 |
| 2019-05-15 | 5 | 1 | 0 | 0 | 0 | 0 |
| 2019-05-28 | 5 | 2 | 0 | 0 | 2 | 0 |
| 2019-06-01 | 5 | 8 | 2 | 0 | 3 | 1 |
| 2019-06-01 | 6 | 1 | 0 | 0 | 0 | 0 |
| 2019-06-01 | 7 | 1 | 0 | 0 | 0 | 0 |
| 2019-06-01 | 8 | 4 | 0 | 0 | 0 | 0 |
| 2019-06-01 | 9 | 2 | 0 | 0 | 0 | 0 |
| 2019-05-01 | 10 | 2 | 0 | 0 | 0 | 0 |
| 2019-05-15 | 10 | 1 | 0 | 0 | 0 | 0 |
| 2019-06-01 | 10 | 3 | 0 | 0 | 1 | 2 |
| 2019-06-01 | 11 | 2 | 0 | 0 | 0 | 0 |
| 2019-06-01 | 12 | 1 | 0 | 0 | 0 | 0 |
| 2019-06-01 | 13 | 1 | 0 | 0 | 0 | 0 |

关于sql - UBER CRM 案例研究 SQL 中的客户保留指标,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59094628/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com