gpt4 book ai didi

php - MySQL/PHP : Find similar/related items by tag/taxonomy

转载 作者:IT老高 更新时间:2023-10-29 00:06:02 24 4
gpt4 key购买 nike

我有一个看起来像这样的城市表。

|id| Name    |
|1 | Paris |
|2 | London |
|3 | New York|

我有一个看起来像这样的标签表。

|id| tag            |
|1 | Europe |
|2 | North America |
|3 | River |

和一个 cities_tags 表:

|id| city_id | tag_id |
|1 | 1 | 1 |
|2 | 1 | 3 |
|3 | 2 | 1 |
|4 | 2 | 3 |
|5 | 3 | 2 |
|6 | 3 | 3 |

如何计算哪些是最密切相关的城市?例如。如果我查看城市 1(巴黎),结果应该是:伦敦 (2)、纽约 (3)

我找到了 Jaccard index但我不确定如何最好地实现这一点。

最佳答案

关于我如何计算最密切相关的城市的问题?例如。如果我查看城市 1(巴黎),结果应该是:伦敦 (2)、纽约 (3),根据您提供的数据集,只有一件事相关,那就是城市之间的公共(public)标签,因此共享公共(public)标签的城市将是最接近的城市下面是查找共享公共(public)标签的城市(除了提供查找其最近的城市之外)的子查询

SELECT * FROM `cities`  WHERE id IN (
SELECT city_id FROM `cities_tags` WHERE tag_id IN (
SELECT tag_id FROM `cities_tags` WHERE city_id=1) AND city_id !=1 )

工作

我假设您将输入城市 ID 或名称之一以找到最接近的城市,在我的例子中“巴黎”的 ID 为 1

 SELECT tag_id FROM `cities_tags` WHERE city_id=1

它将找到 paris 拥有的所有标签 id

SELECT city_id FROM `cities_tags` WHERE tag_id IN (
SELECT tag_id FROM `cities_tags` WHERE city_id=1) AND city_id !=1 )

它将获取除巴黎以外的所有具有与巴黎相同的标签的城市

这是你的 Fiddle

在阅读Jaccard similarity/index 时发现了一些可以理解术语实际含义的东西让我们举个例子,我们有两个集合 A 和 B

Set A={A, B, C, D, E}

Set B={I, H, G, F, E, D}

Formula to calculate the jaccard similarity is JS=(A intersect B)/(A union B)

A intersect B = {D,E}= 2

A union B ={A, B, C, D, E,I, H, G, F} =9

JS=2/9 =0.2222222222222222

现在转向你的场景

Paris has the tag_ids 1,3 so we make the set of this and call our Set P ={Europe,River}

London has the tag_ids 1,3 so we make the set of this and call our Set L ={Europe,River}

New York has the tag_ids 2,3 so we make the set of this and call our Set NW ={North America,River}

Calculting the JS Paris with London JSPL = P intersect L / P union L , JSPL = 2/2 = 1

Calculting the JS Paris with New York JSPNW = P intersect NW / P union NW ,JSPNW = 1/3 = 0.3333333333

到目前为止,这是计算完美 jaccard 索引的查询,您可以在下面的 fiddle 示例

SELECT a.*, 
( (CASE WHEN a.`intersect` =0 THEN a.`union` ELSE a.`intersect` END ) /a.`union`) AS jaccard_index
FROM (
SELECT q.* ,(q.sets + q.parisset) AS `union` ,
(q.sets - q.parisset) AS `intersect`
FROM (
SELECT cities.`id`, cities.`name` , GROUP_CONCAT(tag_id SEPARATOR ',') sets ,
(SELECT GROUP_CONCAT(tag_id SEPARATOR ',') FROM `cities_tags` WHERE city_id= 1)AS parisset

FROM `cities_tags`
LEFT JOIN `cities` ON (cities_tags.`city_id` = cities.`id`)
GROUP BY city_id ) q
) a ORDER BY jaccard_index DESC

在上面的查询中,我已经将结果集派生为两个子选择,以便获得我的自定义计算别名

enter image description here

你可以在上面的查询中添加过滤器而不是计算与自身的相似度

SELECT a.*, 
( (CASE WHEN a.`intersect` =0 THEN a.`union` ELSE a.`intersect` END ) /a.`union`) AS jaccard_index
FROM (
SELECT q.* ,(q.sets + q.parisset) AS `union` ,
(q.sets - q.parisset) AS `intersect`
FROM (
SELECT cities.`id`, cities.`name` , GROUP_CONCAT(tag_id SEPARATOR ',') sets ,
(SELECT GROUP_CONCAT(tag_id SEPARATOR ',') FROM `cities_tags` WHERE city_id= 1)AS parisset

FROM `cities_tags`
LEFT JOIN `cities` ON (cities_tags.`city_id` = cities.`id`) WHERE cities.`id` !=1
GROUP BY city_id ) q
) a ORDER BY jaccard_index DESC

所以结果表明巴黎与伦敦密切相关,然后与纽约相关

Jaccard Similarity Fiddle

关于php - MySQL/PHP : Find similar/related items by tag/taxonomy,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18019976/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com