gpt4 book ai didi

sql - 如果唯一 ID,如何编写 SQL 查询来计算第一次出现后 7 天出现包含不同 ID 的行的实例?

转载 作者:行者123 更新时间:2023-11-29 13:42:43 24 4
gpt4 key购买 nike

我想返回一个日期,unique_id 的计数s 在那个日期第一次出现,数字 unique_id首次出现后 7 天发生的次数以及 7 天后出现的百分比/首次出现次数。

示例 data_import表格

+---------------------+------------------+
| time | distinct_id |
+---------------------+------------------+
| 2018/10/01 | 1 | first instance of `1`
+---------------------+------------------+
| 2018/10/01 | 2 | also first instance, but does not occur 7 days later
+---------------------+------------------+
| 2018/10/02 | 1 | should be disregarded (not first instance of 1)
+---------------------+------------------+
| 2018/10/02 | 3 | first instance of `3`
+---------------------+------------------+
| 2018/10/08 | 1 | First instance 7 days after first instance of `1`
+---------------------+------------------+
| 2018/10/08 | 1 | Don't count as this is the 2nd instance of `1` on this day
+---------------------+------------------+
| 2018/10/09 | 3 | 7 days after first instance of `3`
+---------------------+------------------+
| 2018/10/09 | 1 | 7 days after non-first instance of `1`
+---------------------+------------------+

以及预期返回。

+---------------------+----------------------+------------------------+---------------------------+
| time | num_of_1st_instance | num_occur_7_days_after | percent_used_7_days_after |
+---------------------+----------------------+------------------------+---------------------------+
| 2018/10/01 | 2 | 1 | .50 |
+---------------------+----------------------+------------------------+---------------------------+
| 2018/10/02 | 1 | 1 | 1.0 |
+---------------------+----------------------+------------------------+---------------------------+
| 2018/10/03 | 0 | 0 | 0 |
+---------------------+----------------------+------------------------+---------------------------+

我写的查询很接近,但计算了除第一个以外的出现次数 distinct_id .

在我的示例中,此查询将包括 distinct_id 的出现12018/10/02它发生在 2018/10/02 后 7 天在 2018/10/09 .不需要作为 2018/10/02出现distinct_id 1不是第一次吗。

SELECT
data_import.time AS date,
count(distinct data_import.distinct_id) AS num_installs_on_install_date,
count(distinct future_activity.distinct_id) AS num_occur_7_days_after,
count(distinct future_activity.distinct_id) / count(distinct data_import.distinct_id)::float AS percent_used_7_days_after
FROM data_import
LEFT JOIN data_import AS future_activity ON
data_import.distinct_id = future_activity.distinct_id
AND
DATE(data_import.time) = DATE(future_activity.time) - INTERVAL '7 days'
AND
data_import.time = ( SELECT
time
FROM
data_import
WHERE
distinct_id = future_activity.distinct_id
ORDER BY
time
limit
1 )
GROUP BY DATE(data_import.time)

我希望我解释清楚了。请让我知道如何更改我当前的查询或解决方案的不同方法。

最佳答案

嗯。这是否符合您的要求?

select di.time, sum( (seqnum = 1)::int) as first_instance,
sum( flag_7day ) as num_after_7_day,
sum( (seqnum = 1)::int) * 1.0 / sum( flag_7day ) as ratio
from (select di.*,
row_number() over (partition by distinct_id order by time) as seqnum,
(case when exists (select 1 from data_import di2 where di2.distinct_id = di.distinct_id and di2.time > di.time + interval '7 day')
then 1 else 0
end) as flag_7day
from data_import di
) di
group by di.time;

这不会返回没有第一个实例的天数。那些日子在比率方面似乎有点尴尬,所以我不能 100% 确定你真的需要它们。如果这样做,很容易包含一个 generate_series() 来生成您想要的范围内的所有日期。

关于sql - 如果唯一 ID,如何编写 SQL 查询来计算第一次出现后 7 天出现包含不同 ID 的行的实例?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52920965/

24 4 0
文章推荐: bash - 如何在单引号字符串中转义单引号
文章推荐: mysql - 选择最后已知的作业,但仅适用于最新的人
文章推荐: mysql - 顶级 MySQL 统计信息
文章推荐: c# - dapper.contrib postres 42P01 错误 : relation "" does not exist