In a MySQL query, when using the DISTINCT
option, does ORDER BY
apply after the duplicates are removed? If not, is there any way to make it do so? I think it's causing some issues with my code.
在MySQL查询中,当使用DISTINCT选项时,删除重复项后是否应用ORDER BY?如果没有,有没有办法做到这一点?我认为这会导致我的代码出现一些问题。
EDIT:
Here's some more information about what's causing my problem. I understand that, at first glance, this order would not be important, since I am dealing with duplicate rows. However, this is not entirely the case, since I am using an INNER JOIN
to sort the rows.
编辑:这是一些关于我的问题的更多信息。我知道,乍一看,这个顺序并不重要,因为我处理的是重复的行。然而,情况并非完全如此,因为我正在使用INNER JOIN对行进行排序。
Say I have a table of forum threads, containing this data:
假设我有一个论坛线程表,其中包含以下数据:
+----+--------+-------------+
| id | userid | title |
+----+--------+-------------+
| 1 | 1 | Information |
| 2 | 1 | FAQ |
| 3 | 2 | Support |
+----+--------+-------------+
I also have a set of posts in another table like this:
我在另一张表中也有一组帖子,如下所示:
+----+----------+--------+---------+
| id | threadid | userid | content |
+----+----------+--------+---------+
| 1 | 1 | 1 | Lorem |
| 2 | 1 | 2 | Ipsum |
| 3 | 2 | 2 | Test |
| 4 | 3 | 1 | Foo |
| 5 | 2 | 3 | Bar |
| 6 | 3 | 5 | Bob |
| 7 | 1 | 2 | Joe |
+----+----------+--------+---------+
I am using the following MySQL query to get all threads, then sort them based on the latest post (assuming that posts with higher ids are more recent:
我使用以下MySQL查询来获取所有线程,然后根据最新的帖子对它们进行排序(假设id更高的帖子更新:
SELECT t.*
FROM Threads t
INNER JOIN Posts p ON t.id = p.threadid
ORDER BY p.id DESC
This works, and generates something like this:
这是有效的,并生成如下内容:
+----+--------+-------------+
| id | userid | title |
+----+--------+-------------+
| 1 | 1 | Information |
| 3 | 2 | Support |
| 2 | 1 | FAQ |
| 3 | 2 | Support |
| 2 | 1 | FAQ |
| 1 | 1 | Information |
| 1 | 1 | Information |
+----+--------+-------------+
However, as you can see, the information is correct, but there are duplicate rows. I'd like to remove such duplicates, so I used SELECT DISTINCT
instead. However, this yielded the following:
但是,正如您所看到的,信息是正确的,但是存在重复的行。我想删除这样的重复项,所以我使用了SELECT DISTINCT。然而,这产生了以下结果:
+----+--------+-------------+
| id | userid | title |
+----+--------+-------------+
| 3 | 2 | Support |
| 2 | 1 | FAQ |
| 1 | 1 | Information |
+----+--------+-------------+
This is obviously wrong, since the "Information" thread should be on top. It would seem that using DISTINCT
causes the duplicates to be removed from the top to the bottom, so only the final rows are left. This causes some issues in the sorting.
这显然是错误的,因为“Information”线程应该位于顶部。使用DISTINCT似乎会导致从上到下删除重复项,因此只剩下最后一行。这会导致排序中出现一些问题。
Is this the case, or am I analyzing things incorrectly?
是这样吗,还是我分析错误了?
更多回答
What issue do you think it's causing? What difference would it make?
你认为这会引起什么问题?这会有什么不同?
why would it matter? before or after applying distinct, the order should be the same
为什么这很重要?在应用distinct之前或之后,顺序应该相同
can you show us a sample query of what you are trying and the actual problem you are running into?
你能给我们看一个你正在尝试什么以及你遇到的实际问题的示例查询吗?
@bfrohs - Doesn't make any sense to me. You would get the same results if you sort the rows first then remove the duplicates as opposed to removing the duplicates first and then sorting what remains.
@bfrohs-对我来说没有任何意义。如果你先对行进行排序,然后删除重复项,而不是先删除重复项然后对剩余内容进行排序,你会得到同样的结果。
@bfrohs, but with DISTINCT you'd get (1:a;1:c;2:b).
@bfrohs,但使用DISTINCT,您会得到(1:a;1:c;2:b)。
Two things to understand:
需要了解两件事:
Generally speaking, resultsets are unordered unless you specify an ORDER BY
clause; to the extent that you specify a non-strict order (i.e. ORDER BY
over non-unique columns), the order in which records that are equal under that ordering appear within the resultset is undefined.
I suspect you may be specifying such a non-strict order, which is the root of your problems: ensure that your ordering is strict by specifying ORDER BY
over a set of columns that is sufficient to uniquely identify each record for which you care about its final position in the resultset.
DISTINCT
may use GROUP BY
, which causes the results to be ordered by the grouped columns; that is, SELECT DISTINCT a, b, c FROM t
will produce a resultset that appears as though ORDER BY a, b, c
has been applied. Again, specifying a sufficiently strict order to meet your needs will override this effect.
Following your update, bearing in mind my point #2 above, it is clear that the effect of grouping the results to achieve DISTINCT
makes it impossible to then order by the non-grouped column p.id
; instead, you want:
根据您的更新,请记住我上面的第2点,很明显,将结果分组以实现DISTINCT的效果使您无法按未分组的列p.id排序;相反,您希望:
SELECT t.*
FROM Threads t INNER JOIN Posts p ON t.id = p.threadid
GROUP BY t.id
ORDER BY MAX(p.id) DESC
DISTINCT
informs MySQL how to build a rowset for you, ORDER BY
gives a hint how this rowset should by presented. So the answer is: DISTINCT
first, ORDER BY
last.
DISTINCT通知MySQL如何为您构建行集,ORDER BY则提示应该如何呈现此行集。所以答案是:先区分,后排序。
The order in which DISTINCT
and ORDER BY
are applied, in most cases, will not affect the final output.
在大多数情况下,DISTINCT和ORDERBY的应用顺序不会影响最终输出。
However, if you also use GROUP BY
, this will affect the final output. In this case, the ORDER BY
is performed after the GROUP BY
, which will return unexpected results (assuming you expect the sort to be performed before the grouping).
但是,如果您也使用GROUP BY,这将影响最终输出。在这种情况下,ORDER BY在GROUP BY之后执行,这将返回意外的结果(假设您希望在分组之前执行排序)。
in Mysql, DISTINCT runs first, and then order by runs on the data table selected by the DISTINCT.
在Mysql中,DISTINCT首先运行,然后在DISTINCT选择的数据表上按顺序运行。
for better understanding visit: leetcode: Nth highest salary
为了更好地理解访问:leetcode:第N高薪
更多回答
Awesome, thanks, that works. So, just to confirm, MAX()
compares using the max value of p.id
in each group?
太棒了,谢谢,真管用。那么,为了确认,MAX()使用每组中p.id的最大值进行比较?
But, in reality, DISTINCT
is implemented by sorting the results... so perhaps not if the optimiser uses the same ordering for both tasks.
但是,在现实中,DISTINCT是通过对结果进行排序来实现的。。。因此,如果优化器对两个任务使用相同的排序,则可能不会。
In this case, as eggyval points out, there is an exception. When DISTINCT is grouped with ORDER BY, it does the sorting (filesort) first.
在这种情况下,正如eggyval所指出的,有一个例外。当DISTINCT与ORDERBY分组时,它首先进行排序(文件排序)。
DISTINCT
may use GROUP BY
. What would performing ordering before grouping accomplish that performing it afterwards doesn't (bearing in mind that selecting ungrouped columns without an aggregation function results in indeterminate results - not relevant in this case anyway as DISTINCT
ensures no such columns exist)?
DISTINCT可以使用GROUP BY。在分组之前执行排序会实现什么,而在分组之后执行排序则不会实现什么(请记住,选择没有聚合函数的未分组列会导致不确定的结果-在这种情况下无论如何都不相关,因为DISTINCT确保不存在这样的列)?
@eggyal, the issue isn't with DISTINCT
, but with GROUP BY
and ORDER BY
. If rows are grouped, but not selected, DISTINCT
doesn't help anything, and the query could return the "wrong" row values (e.g. an id
, that is later used to retrieve values).
@eggyal,问题不在于DISTINCT,而在于GROUP BY和ORDER BY。如果行被分组,但没有被选择,DISTINCT没有任何帮助,并且查询可能返回“错误”的行值(例如,稍后用于检索值的id)。
我是一名优秀的程序员,十分优秀!