mysql - 有效地选择相关表中存在行的行-6ren

mysql - 有效地选择相关表中存在行的行

转载作者：行者123 更新时间：2023-11-29 10:29:23

我当前正在开发的系统中有一个重复出现的模式，例如，我需要选择在可能的公司列表下有订单的所有用户。或者如果存在被标记的用户的记录，则需要选择用户。

我的 users 表包含 430,825 条记录，因此处理起来应该不那么困难。现在我已经很接近了，我有一个查询可以得到我想要的 0.047s 执行时间，但是如果我再添加一个片段，它会变得非常慢。

这是我当前的查询，最快的查询:

select`UserID`
from`users`
where(`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
    or`UserID`in(select*
        from(select`UserID`
            from`invoices`
            where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
            and`__Active`=1)`a`)
    or`UserID`in(select*
        from(select`UserID`
            from`quoterequests`
            where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
            and`__Active`=1)`a`))
and(`UserID`in(select*
        from(select`UserID`
            from`userassociations`
            where`_Email`='brian@yeet.com'
            and`__Active`=1)`a`))
and(`UserID`in(select*
        from(select`UserID`
            from`usercustomerflags`
            where`CustomerFlagID`in(10,27,17,1,2,3,4,5,6)
            and`__Active`=1)`a`)
    or not exists(select 1 
        from`usercustomerflags`
        where`__Active`=1 
        and`users`.`UserID`=`UserID`))
and`Deleted`=0 
order by`DateTimeAdded`desc 
limit 50;

(额外的 select*from(...) 是因为这个 https://stackoverflow.com/a/1434712/728236 )

在中间，我通过电子邮件地址进一步拉取用户，同时检查其他相关表中是否有可能与该用户相关的电子邮件。例如，下一部分会在向客户发送报价时搜索用户，包括他们的抄送地址。

select`UserID`
from`users`
where(`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
    or`UserID`in(select*
        from(select`UserID`
            from`invoices`
            where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
            and`__Active`=1)`a`)
    or`UserID`in(select*
        from(select`UserID`
            from`quoterequests`
            where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
            and`__Active`=1)`a`))
and(`UserID`in(select*
        from(select`UserID`
            from`userassociations`
            where`_Email`='brian@yeet.com'
            and`__Active`=1)`a`)
    or`UserID`in(select*
        from(select`UserID`
            from`userquotesemails`
            where`Email`='brian@yeet.com'
            and`__Active`=1)`a`))
and(`UserID`in(select*
        from(select`UserID`
            from`usercustomerflags`
            where`CustomerFlagID`in(10,27,17,1,2,3,4,5,6)
            and`__Active`=1)`a`)
    or not exists(select 1 
        from`usercustomerflags`
        where`__Active`=1 
        and`users`.`UserID`=`UserID`))
and`Deleted`=0 
order by`DateTimeAdded`desc 
limit 50;

我添加了备用表来搜索电子邮件，但现在查询需要 3.016 秒，这要慢得多。奇怪的是，当我构建这个查询时，最后一部分似乎是这里性能的临界点，这是什么原因？

第一个和第二个分别解释

+----+--------------------+-------------------+--+----------------+---------------------------------------------------------------------------------------------+------------------------------+------+-----------------------+---+-------+---------------------------------+
|  1 | PRIMARY            | <subquery6>       |  | ALL            |                                                                                             |                              |      |                       |   | 0.00  | Using temporary; Using filesort |
|  1 | PRIMARY            | users             |  | eq_ref         | PRIMARY,UserID_UNIQUE,fk_users_1_idx,users_Customers                                        | PRIMARY                      |  144 | <subquery6>.UserID    | 1 | 50.00 | Using where                     |
|  6 | MATERIALIZED       | userassociations  |  | ref            | userassociations_UserID,userassociations__Email                                             | userassociations__Email      | 1026 | const                 | 3 | 10.00 | Using where                     |
| 10 | DEPENDENT SUBQUERY | usercustomerflags |  | ref            | usercustomerflags_UserID_idx                                                                | usercustomerflags_UserID_idx |  144 | sterling.users.UserID | 1 | 10.00 | Using where                     |
|  8 | DEPENDENT SUBQUERY | usercustomerflags |  | index_subquery | usercustomerflags_CustomerFlagID_idx,usercustomerflags_UserID_idx                           | usercustomerflags_UserID_idx |  144 | func                  | 1 | 4.95  | Using where                     |
|  4 | DEPENDENT SUBQUERY | quoterequests     |  | index_subquery | quoterequests_CompanyID,quoterequests_UserID,quoterequests__Latest,quoterequests_UserQuotes | quoterequests__Latest        |  145 | func                  | 2 | 5.00  | Using where                     |
|  2 | DEPENDENT SUBQUERY | invoices          |  | index_subquery | Invoice_UserID_idx,Invoice_CompanyID_idx,invoices_SampleRequests,invoices_LateOrdersBubble  | Invoice_UserID_idx           |  145 | func                  | 1 | 3.33  | Using where                     |
+----+--------------------+-------------------+--+----------------+---------------------------------------------------------------------------------------------+------------------------------+------+-----------------------+---+-------+---------------------------------+

+----+--------------------+-------------------+--+-----+---------------------------------------------------------------------------------------------+--------------------------------+------+-----------------------+--------+--------+-------------+
|  1 | PRIMARY            | users             |  | ref | fk_users_1_idx,users_Customers                                                              | users_Customers                |    4 | const                 | 227515 | 100.00 | Using where |
| 12 | DEPENDENT SUBQUERY | usercustomerflags |  | ref | usercustomerflags_UserID_idx                                                                | usercustomerflags_UserID_idx   |  144 | sterling.users.UserID |      1 | 10.00  | Using where |
| 10 | SUBQUERY           | usercustomerflags |  | ALL | usercustomerflags_CustomerFlagID_idx,usercustomerflags_UserID_idx                           |                                |      |                       |   3509 | 4.94   | Using where |
|  8 | SUBQUERY           | userquotesemails  |  | ref | userquotesemails_Email__Active,userquotesemails_UserID                                      | userquotesemails_Email__Active | 1027 | const,const           |      1 | 100.00 |             |
|  6 | SUBQUERY           | userassociations  |  | ref | userassociations_UserID,userassociations__Email                                             | userassociations__Email        | 1026 | const                 |      3 | 10.00  | Using where |
|  4 | SUBQUERY           | quoterequests     |  | ref | quoterequests_CompanyID,quoterequests_UserID,quoterequests__Latest,quoterequests_UserQuotes | quoterequests_CompanyID        |  144 | const                 |  16702 | 10.00  | Using where |
|  2 | SUBQUERY           | invoices          |  | ref | Invoice_UserID_idx,Invoice_CompanyID_idx,invoices_SampleRequests,invoices_LateOrdersBubble  | Invoice_CompanyID_idx          |  144 | const                 |  17678 | 10.00  | Using where |
+----+--------------------+-------------------+--+-----+---------------------------------------------------------------------------------------------+--------------------------------+------+-----------------------+--------+--------+-------------+

此外，我尝试过使用联接，例如加入 invoices 表等，但随后我遇到了连接接收的每个 invoice 或 quoterrequest 都有重复用户行的问题，并进行分组/distinct & 排序结果数据变得非常慢，只有几分钟。

我还尝试了第一个查询的“存在”版本，正如文档 https://dev.mysql.com/doc/refman/5.7/en/subquery-optimization-with-exists.html 所建议的那样像这样

select`UserID`
from`users`
where(`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
    or exists(select 1 
        from`invoices`
        where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
        and`__Active`=1 
        and`users`.`UserID`=`UserID`)
    or exists(select 1 
        from`quoterequests`
        where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
        and`__Active`=1 
        and`users`.`UserID`=`UserID`))
and(exists(select 1 
        from`userassociations`
        where`_Email`='brian@yeet.com'
        and`__Active`=1 
        and`users`.`UserID`=`UserID`))
and(exists(select 1 
        from`usercustomerflags`
        where`CustomerFlagID`in(10,27,17,1,2,3,4,5,6)
        and`__Active`=1 
        and`users`.`UserID`=`UserID`)
    or not exists(select 1 
        from`usercustomerflags`
        where`__Active`=1 
        and`users`.`UserID`=`UserID`))
and`Deleted`=0 
order by`DateTimeAdded`desc 
limit 50;

但这让我达到了 5.516 秒，所以这绝对不是正确的方向。

按照我尝试的方式选择数据的最有效方法是什么？或者我是否需要重组一些表以获得我想要的性能？

<小时/>

我已经隔离了我认为存在的最小的子问题和瓶颈。这是我的较简单的查询

select`users`.`UserID`,`users`.`_Customer`
from`users`
left join`userassociations`on`userassociations`.`UserID`=`users`.`UserID`
and`userassociations`.`__Active`=1 
where(`users`.`Email`='brian@stumpyinc.com'
    or`userassociations`.`_Email`='brian@stumpyinc.com')
and`users`.`Deleted`=0 
order by`users`.`DateTimeAdded`desc 
limit 50;

以及解释

+---+--------+------------------+--+-----+--------------------------------------------------------+-------------------------+-----+-----------------------+--------+--------+-------------+
| 1 | SIMPLE | users            |  | ref | users_getemail_INDEX,unify_email_INDEX,users_Customers | users_Customers         |   4 | const                 | 221463 | 100.00 | Using where |
| 1 | SIMPLE | userassociations |  | ref | userassociations_UserID                                | userassociations_UserID | 144 | sterling.users.UserID |      1 | 100.00 | Using where |
+---+--------+------------------+--+-----+--------------------------------------------------------+-------------------------+-----+-----------------------+--------+--------+-------------+

此查询执行大约需要 1.5 秒

<小时/>

CREATE TABLE `users` (
  `UserID` char(36) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
    ...
  `Email` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
    ...
  `DateTimeAdded` datetime DEFAULT NULL,
    ...
  `Deleted` int(1) NOT NULL DEFAULT '0',
    ...
  `_LatestInvoiceDateTimeAdded` datetime DEFAULT NULL,
  `_InvoiceCount` int(11) NOT NULL DEFAULT '0',
  `_Customer` varchar(512) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
    ...
  PRIMARY KEY (`UserID`),
  UNIQUE KEY `UserID_UNIQUE` (`UserID`),
    ...
  KEY `users_getemail_INDEX` (`Email`(191),`_InvoiceCount`,`_LatestInvoiceDateTimeAdded`,`DateTimeAdded`),
  KEY `unify_email_INDEX` (`Email`(191),`UserID`),
    ...
  KEY `users_Customers` (`Deleted`,`DateTimeAdded`),
    ...
  KEY `users_DateTimeAdded` (`DateTimeAdded`,`UserID`),
  FULLTEXT KEY `users_FULLTEXT__Customer` (`_Customer`),
    ...
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;


CREATE TABLE `userassociations` (
   `UserAssociationID` binary(16) NOT NULL,
   `UserID` char(36) COLLATE utf8mb4_unicode_ci NOT NULL,
   `AssociatedUserID` char(36) COLLATE utf8mb4_unicode_ci NOT NULL,
   `_Email` varchar(256) COLLATE utf8mb4_unicode_ci NOT NULL,
   `__UserID` char(36) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
   `__Active` tinyint(1) NOT NULL DEFAULT '1',
   `__Added` timestamp(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
   `__Updated` timestamp(6) NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP(6),
   PRIMARY KEY (`UserAssociationID`),
   KEY `userassociations_UserID` (`UserID`),
   KEY `userassociations_AssociatedUserID` (`AssociatedUserID`),
   KEY `userassociations___UserID` (`__UserID`),
   KEY `userassociations__Email` (`_Email`),
   CONSTRAINT `userassociations_AssociatedUserID` FOREIGN KEY (`AssociatedUserID`) REFERENCES `users` (`UserID`) ON DELETE NO ACTION ON UPDATE NO ACTION,
   CONSTRAINT `userassociations_UserID` FOREIGN KEY (`UserID`) REFERENCES `users` (`UserID`) ON DELETE NO ACTION ON UPDATE NO ACTION,
   CONSTRAINT `userassociations___UserID` FOREIGN KEY (`__UserID`) REFERENCES `users` (`UserID`) ON DELETE NO ACTION ON UPDATE NO ACTION
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

<小时/>

嗯...看起来它正在工作，但我发现了一对似乎效率不高的表，这是我的 users 和 发票表。

我有这些索引:

users:    INDEX(`CompanyID`, `Deleted`, `DateTimeAdded`) 
invoices: INDEX(`UserID`, `__Active`) 
invoices: INDEX(`CompanyID`)
users:    INDEX(`UserID`, `Deleted`)

和查询

select`users`.`UserID`,`users`.`DateTimeAdded`
from`users`
join`invoices`on`invoices`.`UserID`=`users`.`UserID`
and`invoices`.`__Active`=1 
where`invoices`.`CompanyID`='3e55c8b4-d8b6-11e4-b38f-b8ca3a83b4c8'
and`users`.`Deleted`=0 
order by`DateTimeAdded`desc 
limit 200;

仅此查询就需要 0.3 秒，这对我来说感觉很慢，就像它没有充分利用索引一样，特别是因为 users 只有 430,997 行和 invoices 只有 194,180，这看起来应该是一个非常简单的查询。

编辑:实际上比这更糟糕，如果给定的 CompanyID 仅包含 ~4 行，则此查询需要 3.5 秒

+---+--------+----------+--+-----+------------------------------------------------+-----------------------+-----+--------------------------------+------+--------+----------------------------------------------+
| 1 | SIMPLE | invoices |  | ref | Invoice_CompanyID_idx,invoices_UserID___Active | Invoice_CompanyID_idx | 144 | const                          | 7750 | 10.00  | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | users    |  | ref | users_UserID_Deleted                           | users_UserID_Deleted  | 148 | sterling.invoices.UserID,const |    1 | 100.00 |                                              |
+---+--------+----------+--+-----+------------------------------------------------+-----------------------+-----+--------------------------------+------+--------+----------------------------------------------+

最佳答案

对于那个较小的问题:

( select u.`UserID`, u.`_Customer`, u.DateTimeAdded
    from  `users` AS u
    where  u.`Email` = 'brian@stumpyinc.com'
      and  u.`Deleted` = 0
      AND EXISTS ( SELECT * FROM `userassociations`
                       WHERE UserId = u.UserID
                         AND __Active = 1 )
    order by  u.`DateTimeAdded` desc
    limit  50
)
UNION DISTINCT
( select u.`UserID`, u.`_Customer`, u.DateTimeAdded
    from  `users` AS u
    JOIN  `userassociations` AS ua
         ON  ua.`UserID` = u.`UserID`
        and  ua.`__Active` = 1
    where  ua.`_Email` = 'brian@stumpyinc.com' 
      and  u.`Deleted`=0
    order by  u.`DateTimeAdded` desc
    limit  50
)
order by `DateTimeAdded` desc
limit  50

需要这些:

u:  INDEX(Email, Deleted, DateTimeAdded)  -- date last
ua: INDEX(UserId, __Active)   -- either order
ua: INDEX(_Email)
u:  INDEX(UserID, Deleted)

(如果您遇到语法错误，请告诉我。如果速度太慢，请提供EXPLAIN。)

索引前缀(Email(191))通常是无用的。摆脱的话。以下是避免这种情况的 5 种方法:http://mysql.rjweb.org/doc.php/limits#767_limit_in_innodb_indexes

PK 是唯一的 key ，因此请删除第二个:

PRIMARY KEY (`UserID`),
UNIQUE KEY `UserID_UNIQUE` (`UserID`),

闻起来像 UUID；使用 ascii (ascii_general_ci)，而不是 utf8mb4:

... char(36) COLLATE utf8mb4_unicode_ci

INT(1) 占用 4 个字节；使用 TINYINT 作为标志。

关于mysql - 有效地选择相关表中存在行的行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/47704116/

文章推荐： javascript - 在 EmberJS 中重定向

文章推荐： javascript - 为什么 Javascript 中的日期值不同

文章推荐： python - Flask/Heroku ClearDB 未更新

mysql 'as' 相关
第一段代码工作正常，并给出了我需要的结果。我现在想做的是让它在 'as num' 上返回 3 个数字值对于“as num”上的 3 个不同值，对于同一列上的 3 个不同位置 SELEC
algorithm - 具有三个变量的数学函数(相关)
我想分析一些数据以编写定价算法。以下日期可用: 我需要三个变量/维度的函数/相关因子，它显示三个维度(pers_capacity、卧室数量、浴室数量)增长时中位数(价格)的变化。例如Y(#pers_c
后台崩溃 - Sprite-Kit 相关
正如标题所说 - 我的 Sprite Kit 游戏时不时地在后台崩溃，总是出现此错误 - Exception Type: EXC_BAD_ACCESS (SIGSEGV) Exception Sub
php - 相关(别名)模型上的Phalcon验证消息
假设我尝试保存以下数据，并且Songs模型的name属性上设置了Phalcon \ Mvc \ Model \ Validator \ PresenceOf验证器 // Get an existing
c# - 相关 if 条件由 && 运算符控制
我有一个 if 控件，如下所示； if (Directory.Exists(System.IO.Path.Combine(systemPath, "Reports", companyName))
javascript - Jscript ReadLine() 相关
有人可以告诉我我们使用 ReadLine() 从文件 (.txt) 中读取特定行吗？现在我想读取文件的全部内容(不仅仅是第一行)。为此我需要使用什么方法。我用谷歌搜索了很多，但找不到解决方案。我的代
fpga - 语言如何与 FPGA 相关？
我相信在大学时我用从 C 派生的语言为 FPGA 编写了一个程序。我了解 VHDL 和 verilog 等语言。但是，我不明白的是程序员在使用哪个方面有多少选择？它依赖于FPGA吗？我将使用 Xili
c# - 相关 if 条件由 && 运算符控制
我有一个 if 控件，如下所示； if (Directory.Exists(System.IO.Path.Combine(systemPath, "Reports", companyName))
javascript - 为图像对象设置源 - Dashcode 相关
如何在运行时更改 Dashcode (Javascript) 中图像对象的源？我试过: var image = document.getElementById("image").object;ima
c++ - 相关 C++ 类中多态性的最佳实践？
我有几个相互关联的类，它们将被多种不同的算法使用例子: struct B; struct A { B* parent; }; struct B { std::vector child
mysql - 与 mySQL 相关
我正在开发一个网站，用户在客户收到的表中输入金额，如果任何客户没有提供分期付款(金额)，则用户不会在表中输入任何金额，并且用户希望获取违约者的信息客户以10天为基础。所以我的问题是应该定义什么表和属性
mysql - 从之前的重力形式条目中选择一个值(SQL 相关)
我试图从上一个条目中选择一个值，并每次将该数字加一。我让它工作到选择当前条目值(默认 1000)并递增 1 并重新插入该值(因此每次最终都是 1001)。我需要它来选择该字段的最后一个条目，这样它将变
mysql - 如何从表中选择一行并从另一个(相关)表中选择多行
我不擅长“制作”查询。假设这是我的数据库: artist pics ------------------- -
PHP:长轮询和 Comet 相关
最近，我要为我的网站做一个即时通知系统。我听说 COMET 在这种情况下必不可少。我已经搜索 PHP 和 Comet 一段时间了，但是，我发现的指南和文章似乎只是循环中的 ajax 请求。例如，有一
ios - 为什么这行不通？ ( SpriteKit 相关)
我正在开发一款 iOS 游戏，我希望 clown 在场景外生成，然后向下移动。我的想法是全部创建它们，并将它们以 360 像素的距离放置在不可见的场景中。像这样: SKSpriteNode *clo
Mysql Group by 子句与聚合结果上的 where 相关
我有以下子订单表。 mysql> select * from suborder; +-------------+------------------+ | order_state | bookin
java - 什么时候编码与 Java 相关？
这可能是一个有点初学者的问题，但考虑到在 Java 中调试编码是相当相关的:什么时候编码与 String 对象相关？假设我有一个要保存到文件中的字符串对象。 String 对象本身是否使用某种我应该
c++ - 如何管理对象之间的一对多关系？ ( move 相关)
首先我想说我是 CPP 的新手(我从 cpp11 开始):)考虑以下实体:学生(名字+姓氏)和组(描述+更多学生)。我在 C++ 中创建了以下 2 个类: class Student { privat
javascript - 无法调用函数/AJAX 相关
我正在尝试在单击该复选框时同步更新我的数据库。我决定使用 aJax，但它似乎无法识别 ajax。代码:将成为 Switch_Active(this.id) 函数的元素 ... Deactivat
jQuery - 单击并添加类触发另一个(相关)单击？
我正在创建一个菜单。菜单如下。 $('.category').mouseover(function() { $(this).removeClass('category').addClass('cate

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

mysql - 有效地选择相关表中存在行的行