gpt4 book ai didi

sql - BigQuery 中的 NTILE() 用于非统一存储桶

转载 作者:行者123 更新时间:2023-12-01 00:17:35 37 4
gpt4 key购买 nike

我正在尝试对 BigQuery 上的 Google Merchandise Store 示例数据集执行 RFM 分段。在我的 SQL 查询中,NTILE(5) 根据行顺序将行划分为 5 个存储桶,并返回分配给每行的存储桶编号。在这种情况下,每个桶的大小相等。想了解如何创建不同大小的桶。例如,bucket 1 包含底部的 10%,bucket 2 包含接下来的 20% 的记录等等。谢谢!

#standard SQL    
SELECT
fullVisitorId,
NTILE(5) OVER (ORDER BY last_order_date) AS rfm_recency,
NTILE(5) OVER (ORDER BY count_order) AS rfm_frequency,
NTILE(5) OVER (ORDER BY avg_amount) AS rfm_monetary
FROM (
SELECT
fullVisitorId,
MAX(date) AS last_order_date,
COUNT(*) AS count_order,
AVG(totals.totalTransactionRevenue)/1000000 AS avg_amount
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_20170*`
WHERE
_table_suffix BETWEEN "101"
AND "801"
AND totals.totalTransactionRevenue IS NOT NULL
GROUP BY
fullVisitorId )

最佳答案

您可以使用 row_number()count(*)定义您自己的存储桶:

SELECT fullVisitorId,
(CASE WHEN seqnum_r <= 0.1 * cnt THEN 1
WHEN seqnum_r <= 0.3 * cnt THEN 2
ELSE 3
END) as bin_r,
. . .
FROM (SELECT fullVisitorId,
MAX(date) AS last_order_date,
COUNT(*) AS count_order,
(AVG(totals.totalTransactionRevenue) / 1000000) AS avg_amount,
COUNT(*) OVER () as cnt,
ROW_NUMBER() OVER (ORDER BY MAX(date)) as seqnum_r,
ROW_NUMBER() OVER (ORDER BY COUNT(*)) as seqnum_f,
ROW_NUMBER() OVER (ORDER BY AVG(totals.totalTransactionRevenue)) as seqnum_m
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170*`
WHERE _table_suffix BETWEEN "101" AND "801" AND
totals.totalTransactionRevenue IS NOT NULL
GROUP BY fullVisitorId
) rfm

关于sql - BigQuery 中的 NTILE() 用于非统一存储桶,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51133871/

37 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com