sql - 聚合具有优先级的 SQL 行-6ren

sql - 聚合具有优先级的 SQL 行

转载作者：行者123 更新时间：2023-12-01 00:33:34

25

4

我有一张 table ，里面装满了来自不同来源的元素。一些来源可能具有相同的位置(在我的示例中，不同的 BBC 新闻提要将是不同的来源，但它们都来自 BBC)。每个项目都有一个“唯一”ID，可用于从同一位置识别它。这意味着与站点上相同新闻故事相关但在不同提要下发布的项目将具有相同的“唯一 ID”，但这不一定是全局唯一的。

问题是我想在显示时消除重复项，这样(取决于您看到的提要)您最多只能获得每个故事的一个版本，即使您的两三个提要可能包含指向

我有一个 sources 表，其中包含有关每个源的信息，以及 location_id 和 location_precedence 字段。然后我有一个 items 表，其中包含每个项目、它的 unique_id、source_id 和 content。具有相同 unique_id 和源 location_id 的项目最多应出现一次，最高源 location_precedence 获胜。

我本以为是这样的:

SELECT `sources`.`name` AS `source`,
       `items`.`content`,
       `items`.`published`
FROM `items` INNER JOIN `sources`
  ON `items`.`source_id` = `sources`.`id` AND `sources`.`active` = 1
GROUP BY `items`.`unique_id`, `sources`.`location_id`
ORDER BY `sources`.`location_priority` DESC

可以解决问题，但这似乎忽略了位置优先级字段。我错过了什么？

示例数据:

CREATE TABLE IF NOT EXISTS `sources` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `location_id` int(10) unsigned NOT NULL,
  `location_priority` int(11) NOT NULL,
  `active` tinyint(1) unsigned NOT NULL default '1',
  `name` varchar(150) NOT NULL,
  `url` text NOT NULL,
  PRIMARY KEY  (`id`),
  KEY `active` (`active`)
);

INSERT INTO `sources` (`id`, `location_id`, `location_priority`, `active`, `name`, `url`) VALUES
(1, 1, 25, 1, 'BBC News Front Page', 'http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml'),
(2, 1, 10, 1, 'BBC News England', 'http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/england/rss.xml'),
(3, 1, 15, 1, 'BBC Technology News', 'http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/technology/rss.xml'),
(4, 2, 0, 1, 'Slashdot', 'http://rss.slashdot.org/Slashdot/slashdot'),
(5, 3, 0, 1, 'The Daily WTF', 'http://syndication.thedailywtf.com/TheDailyWtf');

CREATE TABLE IF NOT EXISTS `items` (
  `id` bigint(20) unsigned NOT NULL auto_increment,
  `source_id` int(10) unsigned NOT NULL,
  `published` datetime NOT NULL,
  `content` text NOT NULL,
  `unique_id` varchar(255) NOT NULL,
  PRIMARY KEY  (`id`),
  UNIQUE KEY `unique_id` (`unique_id`,`source_id`),
  KEY `published` (`published`),
  KEY `source_id` (`source_id`)
);

INSERT INTO `items` (`id`, `source_id`, `published`, `content`, `unique_id`) VALUES
(1,  1, '2009-12-01 16:25:53', 'Story about Subject One',                     'abc'),
(2,  2, '2009-12-01 16:21:31', 'Subject One in story',                        'abc'),
(3,  3, '2009-12-01 16:17:20', 'Techy goodness',                              'def'),
(4,  2, '2009-12-01 16:05:57', 'Further updates on Foo case',                 'ghi'),
(5,  3, '2009-12-01 15:53:39', 'Foo, Bar and Quux in court battle',           'ghi'),
(6,  2, '2009-12-01 15:52:02', 'Anti-Fubar protests cause disquiet',          'mno'),
(7,  4, '2009-12-01 15:39:00', 'Microsoft Bleh meets lukewarm reception',     'pqr'),
(8,  5, '2009-12-01 15:13:45', 'Ever thought about doing it in VB?',          'pqr'),
(9,  1, '2009-12-01 15:13:15', 'Celebrity has &#039;new friend&#039;',        'pqr'),
(10, 1, '2009-12-01 15:09:57', 'Microsoft launches Bleh worldwide',           'stu'),
(11, 2, '2009-12-01 14:57:22', 'Microsoft launches Bleh in UK',               'stu'),
(12, 3, '2009-12-01 14:57:22', 'Microsoft launches Bleh',                     'stu'),
(13, 3, '2009-12-01 14:42:15', 'Tech round-up',                               'vwx'),
(14, 2, '2009-12-01 14:36:26', 'Estates &#039;old news&#039; say government', 'yza'),
(15, 1, '2009-12-01 14:15:21', 'Iranian doctor &#039;was poisoned&#039;',     'bcd'),
(16, 4, '2009-12-01 14:14:02', 'Apple fans overjoyed by iBlah',               'axf');

查询后的预期内容:

关于主题一的故事
技术好
Foo、Bar 和 Quux 在法庭上打架
反富巴尔抗议引起不安
Microsoft Bleh 遭遇冷遇
有没有想过用 VB 来实现？
名人有“新 friend ”
微软在全局推出 Bleh
技术综述
政府称房地产“旧闻”
伊朗医生“中毒”
苹果粉丝为 iBlah 高兴不已

我尝试了 Andomar 解决方案的变体，并取得了一些成功:

SELECT      s.`name` AS `source`,
            i.`content`,
            i.`published`
FROM        `items` i
INNER JOIN  `sources` s
ON          i.`source_id` = s.`id`
AND         s.`active` = 1
INNER JOIN (
  SELECT `unique_id`, `source_id`, MAX(`location_priority`) AS `prio` 
  FROM `items` i
  INNER JOIN `sources` s ON s.`id` = i.`source_id` AND s.`active` = 1
  GROUP BY `location_id`, `unique_id`
) `filter`
ON          i.`unique_id` = `filter`.`unique_id`
AND         s.`location_priority` = `filter`.`prio`
ORDER BY    i.`published` DESC
LIMIT 50

使用 AND s.location_priority = filter.prio 事情几乎如我所愿。因为一个项目可以来自具有相同优先级的多个源，所以项目可以重复。在这种情况下，外部查询上的额外 GROUP BY i.unique_id 可以完成这项工作，我想如果优先级相同，哪个来源“获胜”并不重要。

我曾尝试使用 AND i.source_id = filter.source_id 来代替，这几乎可以工作(即消除了额外的 GROUP BY)但没有给出结果正确的来源。在上面的例子中，它给了我“关于 Foo 案的进一步更新”(来源“BBC News England”)而不是“Foo, Bar and Quux in court battle”(来源“BBC Technology News”)。看看内部的结果查询，我得到:

unique_id: 'ghi'
source_id: 2
prio: 15

请注意，来源 ID 不正确(应为:3)。

最佳答案

Order by只是对行进行排序，它不会在其中进行选择。

过滤掉 location_priority 较低行的方法之一是使用 inner join作为过滤器:

SELECT     s.name, i.content, i.published
FROM       items i 
INNER JOIN sources s
ON         i.source_id = s.id
AND        s.active = 1
INNER JOIN (
    SELECT unique_id, max(location_priority) as prio
    FROM items i
    INNER JOIN sources s ON s.id = i.source_id AND s.active = 1
    GROUP BY unique_id) filter
ON         i.unique_id = filter.unique_id
AND        s.location_priority = filter.prio;

另一种选择是 where ... in <subquery>子句，例如:

SELECT     s.name, i.content, i.published
FROM       items i 
INNER JOIN sources s
ON         i.source_id = s.id
AND        s.active = 1
WHERE      (i.unique_id, s.location_priority) IN (
    SELECT unique_id, max(location_priority)
    FROM items i
    INNER JOIN sources s ON s.id = i.source_id AND s.active = 1
    GROUP BY unique_id
);

此问题也称为“选择包含组范围最大值的记录”。 Quassnoi 写了一篇 nice article在上面。

编辑:以相同的优先级打破与多个来源的联系的一种方法是 WHERE带有子查询的子句。这个例子打破了 i.id DESC 的关系:

SELECT     s.name, i.unique_id, i.content, i.published
FROM       (
           SELECT unique_id, min(location_priority) as prio
           FROM items i
           INNER JOIN sources s ON s.id = i.source_id AND s.active = 1
           GROUP BY unique_id
           ) filter
JOIN       items i
JOIN       sources s
ON         s.id = i.source_id 
           AND s.active = 1
WHERE      i.id =
           (
           SELECT   i.id
           FROM     items i
           JOIN     sources s 
           ON       s.id = i.source_id 
                    AND s.active = 1
           WHERE    i.unique_id = filter.unique_id
           AND      s.location_priority = filter.prio
           ORDER BY i.id DESC
           LIMIT 1
           )

Quassnoi 在 selecting records holding group-wise maximum (resolving ties) 上也有一篇文章:)

关于sql - 聚合具有优先级的 SQL 行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/1855303/

25

4

0

文章推荐： mysql - 令人困惑的时间输出

文章推荐： javascript - 从 Highcharts x 轴中删除日期时间

文章推荐： php - mysql - 一次更新两条记录

Java 优先级
int x = 1; System.out.println( x++ + x++ * --x ); 上面的代码打印出“5”，但我不明白怎么办？我一直为最后一个 x 取零，然后乘以仍然为 0 的第二个
java - 优先级
我现在正在尝试使用 Preference 类首选项 pfrOfThis = Preferences.userNodeForPackage(this) 出现错误: “类 java.util.prefs
Python 优先级
用下面的代码 import sys print "Hello " + sys.argv[1] if len(sys.argv) > 1 else "Joe" + "." 当我运行时 python he
CSS 优先级
我的网页包含: td { padding-left:10px; } 引用的样式表包含: .rightColumn * {margin: 0; padding: 0;} 我在 rightc
JPA CascadeType 优先级？
使用 JPA 我有一个关于 CascadeTypes 的问题。例如: @ManyToMany(fetch=FetchType.LAZY, cascade={CascadeType.PERSIST,
隐含点和括号的 Scala 优先级
下面的“括号”是怎么写的？ val words = List("foo", "bar", "baz") val phrase = "These are upper case: " + words ma
c - 运算符结合性，优先级
我只是想知道，对于以下代码，编译器是否单独使用关联性/优先级或其他一些逻辑来评估。 int i = 0, k = 0; i = k++; 如果我们根据关联性和优先级进行评估，postfix ++具有比
Azure FrontDoor 优先级
我设置了一个 Azure FrontDoor 服务，以主/备份类型的方式将流量分配给两个 API 管理服务。就像我希望所有流量都流向我的主要 APIM 服务一样，如果我碰巧关闭该服务(假装中断)，那么
css - 媒体查询逻辑(优先级)
这是一个简单的 CSS: /* Smartphones (portrait and landscape) ----------- */ @media only screen and (min-devi
Azure FrontDoor 优先级
我设置了一个 Azure FrontDoor 服务，以主/备份类型的方式将流量分配给两个 API 管理服务。就像我希望所有流量都流向我的主要 APIM 服务一样，如果我碰巧关闭该服务(假装中断)，那么
Perl:优先级(左列表运算符)
来自 Programming Perl pg 90，他说: @ary = (1, 3, sort 4, 2); print @ary; 排序右侧的逗号在排序之前求值，而左侧的逗号在排序之
Sqlite GROUP BY 优先级
+----+------------+------+ | id | title | lang | +----+------------+------+ | 1 | title 1 EN |
java - DSCP 优先级
如何使用 Java 获取 DiffServe 代码点 (DSCP) 整数的优先级部分？我预计它涉及位移位，但由于某种原因，我似乎无法获得我期望的值。最佳答案假设我理解正确，只需向右执行 3 位逻辑
jquery - $.ajax()优先级
我有下一个运行良好的 js 函数: $(function () { $(".country").click(function () { var countries = Arra
c++ - 取消引用和后缀++优先级
int a[3]={10,20,30}; int* p = a; cout << *p++ << endl; 根据 wikipedia ，后缀++的优先级高于解引用，*p++应该先运行p++再解引用结
通过归档的 C++ 优先级
我想在优先读取归档后解决这种类型的表达式 2+3/5*9+3-4 这是我尝试解决该任务的代码我该如何解决这个问题 while ( !inputFile.eof() ) { getline( inp
括号的 Javascript 优先级
我正在玩 Rhino 并注意到这种奇怪的行为似乎是运算符优先级: js> {}+{} NaN js> ''+{}+{} [object Object][object Object] js> ''+({
具有复合表达式的 bash 优先级
我想遍历文件列表并检查它们是否存在，如果文件不存在则给出错误并退出。我写了下面的代码: FILES=( file1.txt file2.txt file3.txt ) for file in ${FI
mysql - 合并多个级联选择，优先级
我正在执行级联 SELECT: SELECT * FROM x WHERE a = 1 AND b = 2 AND c = 3 => If nothing found, try: SELECT * F
CSS 优先级——哪一个优先？
即将参加考试，我正在参加之前的考试。问题: 当两个或多个样式表规则应用于同一元素时，以下哪种类型的规则将优先？一个。任何来自浏览器的声明 b.有用户来源的正常声明 C。作者来源正常声明 d.文档级

首页

博学

6Ren·AI

商城

sql - 聚合具有优先级的 SQL 行