gpt4 book ai didi

mysql - 优化跨多个表的全文搜索

转载 作者:行者123 更新时间:2023-11-29 21:30:21 27 4
gpt4 key购买 nike

我想在内容表中的标题和关键字上搜索请求的术语 ($q),同时也搜索模型,这些模型位于另一个表中并由中间的表链接。另外,我需要获取另一个表中的 View 数。

这是我到目前为止一直在处理的查询,结果很好,但太慢了(当我在 PhpMyAdmin 中运行它时,平均 0.6 秒......我们每月有数百万访问者)

SELECT DISTINCT SQL_CALC_FOUND_ROWS
c.*,
cv.views,
(MATCH (c.title) AGAINST ('{$q}') * 3) Relevance1,
MATCH (c.keywords) AGAINST ('{$q}') Relevance2,
(MATCH (a.`name`) AGAINST ('{$q}') * 2) Relevance3
FROM
content AS c
LEFT JOIN
content_actors AS ca ON ca.content = c.record_num
LEFT JOIN
actors AS a ON a.record_num = cm.actor
LEFT JOIN
content_views AS cv ON cv.content = c.record_num
WHERE
c.enabled = 1
GROUP BY c.title, c.length
HAVING (Relevance1 + Relevance2 + Relevance3) > 0
ORDER BY (Relevance1 + Relevance2 + Relevance3) DESC

表架构如下所示:

content
record_num title keywords
1 Video1 Comedy, Action, Supercool
2 Video2 Comet

content_actors
content model
1 1
1 2
2 1

actors
record_num name
1 Jennifer Lopez
2 Bruce Willis

content_views
content views
1 160
2 312

这是我通过执行 SHOW INDEX FROM tablename 找到的索引:

Table              Column_Name     Seq_in_index     Key_name     Index_type
---------------------------------------------------------------------------
content record_num 1 PRIMARY BTREE
content keywords 1 keywords FULLTEXT
content keywords 2 title FULLTEXT
content title 1 title FULLTEXT
content description 1 description FULLTEXT
content keywords 1 keywords_2 FULLTEXT

content_actors content 1 content BTREE
content_actors actor 2 content BTREE
content_actor actor 1 actor BTREE

actors record_num 1 PRIMARY BTREE
actors name 1 name BTREE
actors name 1 name_2 FULLTEXT

content_views content 1 PRIMARY BTREE
content_views views 1 views BTREE

这是查询的解释:

ID     SELECT_TYPE     TABLE     TYPE       POSSIBLE_KEYS          KEY         ROWS      EXTRA
1 SIMPLE c ref enabled_2, enabled enabled 29210 Using where; Using temporary; Using filesort
1 SIMPLE ca ref content content 1 Using index
1 SIMPLE a eq_ref PRIMARY PRIMARY 1
1 SIMPLE cv eq_ref PRIMARY PRIMARY 1

我使用 GROUP BY 来避免重复内容,但单独使用该 group by 似乎会使处理查询所需的时间增加一倍。

编辑 好吧,在玩了一下查询之后,我意识到,如果我删除 GROUP BY,我会得到重复项,如果我让 GROUP BY 在那里,它不会不采用正确的 Relevance3 值(如果没有 GROUP BY,一个会返回 Relevance3 的值,而另一个则不会...)

最佳答案

MATCH(或运算)添加到WHERE - 这将显着减少SQL_CALC_FOUND_ROWS中要处理的行数code> 并消除对 HAVING... 的需要。

而不是

cv.views,
...
LEFT JOIN content_views AS cv ON cv.content = c.record_num

( SELECT views FROM content_views ON content = c.record_num ) AS views,

编辑

需要LEFTGROUP BY,因为actors是可选的,并且可能有多个actors 。由于您根本不需要 Actor 姓名,因此您可以通过执行以下操作来摆脱它

WHERE ... AND ( EXISTS SELECT * 
FROM content_actors
JOIN actors AS a ON ...
WHERE MATCH (a.`name`) AGAINST ('{$q}')
AND ca...
)

但这并不能让您在 ORDER BY 中包含相关性。

因此,您需要使用 UNION DISTINCT 构建一个子查询。将有 2 个SELECT:

选择#1:

SELECT c.id,
3 * MATCH(c.title) AGAINST ('{$q}')
+ MATCH(c.keywords) AGAINST ('{$q}') AS relevance
FROM Content AS c
WHERE MATCH(c.title, c.keywords) AGAINST ('{$q}')

(并且有 FULLTEXT(title, keywords))这将有效地获取有用的 content` 行的 id。

选择#2:

SELECT c.id,
2*MAX(MATCH(a.actor) AGAINST ('{$q}') AS actor_rel) AS relevance
FROM content AS c
JOIN content_actors ca ON ca.content = c.record_num
JOIN actors a ON a.record_num = ca.actor
WHERE MATCH(a.actor) AGAINST ('{$q}')
GROUP BY c.id;

确保有content_actors: INDEX(actor)content: INDEX(record_num)。此SELECT将有效地从actors开始并返回到content。请注意,当两个参与者 MATCH 时,它会执行与您的代码不同的操作;希望我的 MAX 是一个更好的解决方案。

现在,让我们把事情放在一起......

选择#3:

SELECT id, SUM(rel) AS relevance
FROM ( ... select #1 ... )
UNION ALL
( ... select #2 ... )
GROUP BY id

但这还不是全部......

选择#4:

SELECT c.*,
( ... views ... ) AS views
FROM ( ... select #3 ... ) AS u
JOIN content c ON c.id = u.id

我建议您手动运行每个步骤来验证它们,逐渐将所有部分组合在一起。是的,它很复杂,但应该相当快。

关于mysql - 优化跨多个表的全文搜索,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35317501/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com