gpt4 book ai didi

mysql - MySQL 中的百分位数按日期分组

转载 作者:行者123 更新时间:2023-11-30 23:11:50 24 4
gpt4 key购买 nike

我有一个数据库表,它基本上包含列 date Date, int UserId, double Value

我希望能够进行查询,为所有用户提供每个日期的值(value)的 10% 和 90% 百分位数,例如 SELECT Date, Pct10(Value), Pct90(Value) 来自表格按日期分组

我知道在 MySQL 中使用 Count(*)LIMIT 计算百分位数的不同方法并计算行数,但是,我不知道如何应用它对一个语句中的每个日期值进行迭代。

示例数据:

Date       | UserId  | Value
2013-01-01 | 0 | 1
2013-01-01 | 1 | 1
2013-01-01 | 2 | 1
2013-01-01 | 3 | 1
2013-01-01 | 4 | 2
2013-01-01 | 5 | 2
2013-01-01 | 6 | 2
2013-01-01 | 7 | 2
2013-01-01 | 8 | 2
2013-01-01 | 9 | 2
2013-01-01 | 10 | 9
2013-01-02 | 1 | 1
2013-01-02 | 9 | 1

预期的结果是

Date       | Pct10  | Pct90
2013-01-01 | 1 | 2
2013-01-02 | 1 | 1

最佳答案

我不确定如何获取百分位数。我正在使用基于 select nth percentile from mysql 的子查询下面,但我不太确定我是否正确修改了它。我的答案的重点在于子查询的组合。

下面的查询会很慢,并且随着表的大小增加呈指数级变慢,但它应该可以满足您的需求:

SELECT p10.Date, Pct10, Pct90
FROM (
SELECT Date, count(Value) AS Pct10
FROM mydata
GROUP BY Date, Value
ORDER BY ABS(0.1-(count(Value)/(select count(*) from mydata)))
LIMIT 1) AS p10
INNER JOIN (
SELECT Date, count(Value) AS Pct9
FROM mydata
GROUP BY Date, Value
ORDER BY ABS(0.9-(count(Value)/(select count(*) from mydata)))
LIMIT 1) AS p90 ON p10.Date = p90.Date
GROUP BY p1.Date

这是我的第二个想法。如果可行,它将比我列出的第一个更快、更高效,但对于更大的表来说仍然很慢。

SELECT p10.Date, count(Value) AS Pct10, Pct90
FROM mydata p10
INNER JOIN (
SELECT Date, count(Value) AS Pct90
FROM mydata
GROUP BY Date, Value
ORDER BY ABS(0.9-(count(Value)/(select count(*) from mydata)))
LIMIT 1) AS p90 ON p10.Date = p90.Date
GROUP BY Date, Value
ORDER BY ABS(0.1-(count(Value)/(select count(*) from mydata)))
LIMIT 1

编辑

好的,集思广益时间到了。鉴于这是一个针对一个日期的百分位数的子查询(我什至不确定它是如何工作的):

    SELECT Date, count(Value) AS Pct90
FROM mydata
WHERE Date = ?
GROUP BY Value
ORDER BY ABS(0.9-(count(Value)/(select count(*) from mydata WHERE Date = ?)))
LIMIT 1

然后让我们尝试修复 ORDER BY:

   SELECT Date, count(Value) as Pct90
FROM mydata
INNER JOIN (SELECT Date, COUNT(*) AS DateTotal FROM mydata GROUP BY Date) AS d
ON d.Date = mydata.Date
GROUP BY Date, Value
ORDER BY (ABS(0.9-(COUNT(Value)/d.DateTotal)))
LIMIT 1

如果你在我之前的例子中使用这个模式,也许它会起作用。

编辑 2

所以,我们又来了,因为我们不能使用 LIMIT 1(我早该意识到这一点)。我实际上在我自己的数据库上测试了以下内容(希望我将所有字段和表名称更改回它们应该的样子!)并且它似乎有效。您必须为 p10 再次执行此操作并将两者结合起来。

--- removed due to typos ---

编辑 3

我发现 Edit 2 有一些错误,所以我删除了它。这是整个百分比查询。据我所知,此查询适用于我的数据库(使用不同的字段和表)。

SELECT n.Date, n.Pct AS Pct10, n.Value AS Pct10Value, q.Pct AS Pct90, q.Value AS Pct90Value FROM (
SELECT p.Date, p.Pct, p.Value, m.Selector FROM (
SELECT mydata.Date, Value, COUNT(Value) as Pct, (ABS(0.1-(COUNT(Value)/d.DateTotal))) AS Abs10
FROM mydata
INNER JOIN (SELECT Date, COUNT(*) AS DateTotal FROM mydata GROUP BY Date) AS d
ON d.Date = mydata.Date
GROUP BY Date, Value
) p
INNER JOIN (
SELECT Date, MIN(Abs10) AS Selector FROM (
SELECT mydata.Date, Value, COUNT(Value) as Pct, (ABS(0.1-(COUNT(Value)/d.DateTotal))) AS Abs10
FROM mydata
INNER JOIN (SELECT Date, COUNT(*) AS DateTotal FROM mydata GROUP BY Date) AS d
ON d.Date = mydata.Date
GROUP BY Date, Value
) x GROUP BY Date
) AS m ON m.Selector = p.Abs10
GROUP BY p.Date) n
INNER JOIN (
SELECT p.Date, p.Pct, p.Value, m.Selector FROM (
SELECT mydata.Date, Value, COUNT(Value) as Pct, (ABS(0.9-(COUNT(Value)/d.DateTotal))) AS Abs90
FROM mydata
INNER JOIN (SELECT Date, COUNT(*) AS DateTotal FROM mydata GROUP BY Date) AS d
ON d.Date = mydata.Date
GROUP BY Date, Value
) p
INNER JOIN (
SELECT Date, MIN(Abs90) AS Selector FROM (
SELECT mydata.Date, Value, COUNT(Value) as Pct, (ABS(0.9-(COUNT(Value)/d.DateTotal))) AS Abs90
FROM mydata
INNER JOIN (SELECT Date, COUNT(*) AS DateTotal FROM mydata GROUP BY Date) AS d
ON d.Date = mydata.Date
GROUP BY Date, Value
) x GROUP BY Date
) AS m ON m.Selector = p.Abs90
GROUP BY p.Date) q ON q.Date = n.Date

关于mysql - MySQL 中的百分位数按日期分组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19501132/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com