gpt4 book ai didi

Firebase 导出到 BigQuery : retention cohorts query

转载 作者:行者123 更新时间:2023-12-04 03:31:46 31 4
gpt4 key购买 nike

Firebase 通过 Firebase 远程配置提供拆分测试功能,但无法过滤具有用户属性(实际上具有任何属性)的同类群组部分中的保留。

为了寻求这个问题的解决方案,我正在寻找 BigQuery,因为 Firebase Analytics 提供了将数据导出到该服务的可用方法。

但是我遇到了很多问题,谷歌没有答案或例子可以指出我正确的方向。

一般问题:

作为第一步,我需要聚合代表相同数据 firebase 队列的数据,所以我可以确定我的计算是正确的:

firebase cohorts

下一步应该只是对查询应用约束,以便它们匹配自定义用户属性。

到目前为止,我得到了什么:

enter image description here

主要问题 - 用户计算的巨大差异。有时大约有 100 个用户,但有时接近 1000 个。

这是我使用的方法:

# 1

# Count users with `user_dim.first_open_timestamp_micros`
# in specified period (w0 – week 1)
# this is the way firebase group users to cohorts
# (who started app on the same day or during the same week)
# https://support.google.com/firebase/answer/6317510

SELECT
COUNT(DISTINCT user_dim.app_info.app_instance_id) as count
FROM
(
TABLE_DATE_RANGE
(
[admob-app-id-xx:xx_IOS.app_events_],
TIMESTAMP('2016-11-20'),
TIMESTAMP('2016-11-26')
)
)
WHERE
STRFTIME_UTC_USEC(user_dim.first_open_timestamp_micros, '%Y-%m-%d')
BETWEEN '2016-11-20' AND '2016-11-26'

# 2

# For each next period count events with
# same first_open_timestamp
# Here is example for one of the weeks.
# week 0 is Nov20-Nov26, week 1 is Nov27-Dec03

SELECT
COUNT(DISTINCT user_dim.app_info.app_instance_id) as count
FROM
(
TABLE_DATE_RANGE
(
[admob-app-id-xx:xx_IOS.app_events_],
TIMESTAMP('2016-11-27'),
TIMESTAMP('2016-12-03')
)
)
WHERE
STRFTIME_UTC_USEC(user_dim.first_open_timestamp_micros, '%Y-%m-%d')
BETWEEN '2016-11-20' AND '2016-11-26'

# 3

# Now we have users for each week w1, w2, ... w5
# Calculate retention for each of them
# retention week 1 = w1 / w0 * 100 = 25.72181359
# rw2 = w2 / w1 * 100
# ...
# rw5 = w5 / w1 * 100

# 4

# Shift week 0 by one and repeat from step 1

BigQuery 查询提示请求

非常感谢有关构建复杂查询的任何提示和方向,这些查询可以一步聚合和计算此任务所需的所有数据。

Here is BigQuery Export schema if needed

附带问题:
  • 为什么所有的user_dim.device_info.device_iduser_dim.device_info.resettable_device_idnull ?
  • user_dim.app_info.app_id文档中缺少(如果 Firebase 支持队友会阅读此问题)
  • 怎么样event_dim.timestamp_microsevent_dim.previous_timestamp_micros应该使用,我无法达到他们的目的。

  • PS

    来自 Firebase 队友的人会很好地回答这个问题。 Five month ago there are was one mention关于通过过滤扩展群组功能或显示 bigqueries 示例,但事情并没有发生变化。他们说 Firebase Analytics 是可行的,他们说 Google Analytics 已被弃用。
    现在我花了第二天的时间来学习 bigquery,并在现有的分析工具上构建我自己的解决方案。我不,堆栈溢出不是这个评论的地方,但是你们在想吗?拆分测试可能会在语法上影响我的应用程序的保留。我的应用程序不卖任何东西,在很多情况下,漏斗和事件都不是有值(value)的指标。

    最佳答案

    Any tips and directions to go about building complex query which may aggregate and calculate all data required for this task in one step is very appreciated.

    yes, generic bigquery will work fine



    下面不是最通用的版本,但可以给你一个想法
    在这个例子中,我使用 Stack Overflow Data可在 Google BigQuery Public Datasets

    第一个子选择 - 事件 - 在大多数情况下,这是您唯一需要重写以反射(reflect)数据细节的内容。
    它的作用是:
    一种。定义要为分析设置的时间段。
    在下面的例子中 - 它是一个月 - FORMAT_DATE ('%Y-%m', ...
    但是你可以使用 year, week, day or anything else – 分别
    • 按年份 - FORMAT_DATE('%Y', DATE(answers.creation_date)) AS 期间
    • 按周 - FORMAT_DATE('%Y-%W', DATE(answers.creation_date)) AS 期间
    • 按天 - FORMAT_DATE('%Y-%m-%d', DATE(answers.creation_date)) AS 期间
    • …
    湾此外,它仅“过滤”您需要分析的事件/事件类型
    例如,`WHERE CONCAT('|', questions.tags, '|') LIKE '%|google-bigquery|%' 寻找 google-bigquery 标记问题的答案

    其余的子查询更不通用,大部分可以按原样使用
    #standardSQL
    WITH activities AS (
    SELECT answers.owner_user_id AS id,
    FORMAT_DATE('%Y-%m', DATE(answers.creation_date)) AS period
    FROM `bigquery-public-data.stackoverflow.posts_answers` AS answers
    JOIN `bigquery-public-data.stackoverflow.posts_questions` AS questions
    ON questions.id = answers.parent_id
    WHERE CONCAT('|', questions.tags, '|') LIKE '%|google-bigquery|%'
    GROUP BY id, period
    ), cohorts AS (
    SELECT id, MIN(period) AS cohort FROM activities GROUP BY id
    ), periods AS (
    SELECT period, ROW_NUMBER() OVER(ORDER BY period) AS num
    FROM (SELECT DISTINCT cohort AS period FROM cohorts)
    ), cohorts_size AS (
    SELECT cohort, periods.num AS num, COUNT(DISTINCT activities.id) AS ids
    FROM cohorts JOIN activities ON activities.period = cohorts.cohort AND cohorts.id = activities.id
    JOIN periods ON periods.period = cohorts.cohort
    GROUP BY cohort, num
    ), retention AS (
    SELECT cohort, activities.period AS period, periods.num AS num, COUNT(DISTINCT cohorts.id) AS ids
    FROM periods JOIN activities ON activities.period = periods.period
    JOIN cohorts ON cohorts.id = activities.id
    GROUP BY cohort, period, num
    )
    SELECT
    CONCAT(cohorts_size.cohort, ' - ', FORMAT("%'d", cohorts_size.ids), ' users') AS cohort,
    retention.num - cohorts_size.num AS period_lag,
    retention.period as period_label,
    ROUND(retention.ids / cohorts_size.ids * 100, 2) AS retention , retention.ids AS rids
    FROM retention
    JOIN cohorts_size ON cohorts_size.cohort = retention.cohort
    WHERE cohorts_size.cohort >= FORMAT_DATE('%Y-%m', DATE('2015-01-01'))
    ORDER BY cohort, period_lag, period_label

    您可以使用您选择的工具可视化上述查询的结果
    注意:您可以使用 period_lag 或 period_label
    在下面的示例中查看它们的使用差异

    与 period_lag

    enter image description here

    带 period_label

    enter image description here

    关于Firebase 导出到 BigQuery : retention cohorts query,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41509431/

    31 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com