gpt4 book ai didi

ruby - 如何在 postgresql 中创建多列推荐引擎?

转载 作者:数据小太阳 更新时间:2023-10-29 07:04:59 26 4
gpt4 key购买 nike

我在 postgresql 中有一个表,其中包含一些汽车 +1000000 条记录:

+----+--------+------+---------+-----------+-------------+------------+------------+
| id | price | year | mileage | fuel_type | body_type | brand | model |
+----+--------+------+---------+-----------+-------------+------------+------------+
| 1 | 4894 | 2011 | 121842 | "Benzin" | "Sedan" | "Toyota" | "Yaris" |
| 2 | 4989 | 2012 | 33901 | "Benzin" | "Hatchback" | "Renault" | "Twingo" |
| 3 | 4990 | 2013 | 55105 | "Benzin" | "Hatchback" | "Renault" | "Twingo" |
| 3 | 5290 | 2013 | 20967 | "Benzin" | "Hatchback" | "Renault" | "Twingo" |
| 5 | 5594 | 2008 | 121281 | "Benzin" | "Hatchback" | "Mercedes" | "A170" |
| 6 | 4690 | 2012 | 71303 | "Benzin" | "Hatchback" | "Renault" | "Twingo" |
| 7 | 5290 | 2013 | 58300 | "Benzin" | "Hatchback" | "Renault" | "Twingo" |
| 8 | 5890 | 2013 | 35732 | "Benzin" | "Hatchback" | "Renault" | "Twingo" |
| 9 | 5990 | 2013 | 38777 | "Benzin" | "Hatchback" | "Renault" | "Twingo" |
| 10 | 6180 | 2013 | 69491 | "Benzin" | "Hatchback" | "VW" | "up!" |
| 11 | 6490 | 2012 | 72900 | "Benzin" | "Sedan" | "Renault" | "Clio III" |
| 12 | 6790 | 2012 | 49541 | "Benzin" | "Hatchback" | "Renault" | "Clio III" |
| 13 | 6790 | 2012 | 46377 | "Benzin" | "Hatchback" | "Renault" | "Clio III" |
| 14 | 6790 | 2012 | 45200 | "Benzin" | "Hatchback" | "Renault" | "Clio III" |
| 15 | 6894 | 2007 | 108840 | "Benzin" | "Sedan" | "VW" | "Golf V" |
| 16 | 6990 | 2009 | 54200 | "Benzin" | "Sedan" | "Renault" | "Mégane" |
| 17 | 6990 | 2012 | 40652 | "Benzin" | "Hatchback" | "Renault" | "Clio III" |
| 18 | 6990 | 2012 | 38080 | "Benzin" | "Sedan" | "Renault" | "Clio III" |
| 19 | 7290 | 2012 | 28600 | "Benzin" | "Hatchback" | "Renault" | "Clio III" |
| 20 | 7290 | 2013 | 52800 | "Benzin" | "Hatchback" | "Renault" | "Twingo" |
+----+--------+------+---------+-----------+-------------+------------+------------+

我想创建一个推荐引擎,它可以根据一些不同的标准返回 20 个最“相似”的匹配项,例如当用户搜索:brand = 'Renault' AND price < 60000 AND year > 2010 , 我想在搜索结果之外展示一些其他的、与其他汽车更宽松的结果,这是相似的,但不一定完全匹配所有搜索条件。

我试过用 ruby​​ 编写一些基于规则的代码,它的作用如下:

  • 如果您搜索“Renault Clio”,那么“Renault Twingo”也是一个接近的匹配项
  • 如果您的最高价格为 8000,则由最接近该价格的人订购
  • 等等

基于这段代码,我生成了一个带有 where 和 order by 子句的 SQL 查询。

但问题是,事情变得很庞大,因为我有 20 个不同的列,我想根据初始标准有选择地考虑。此外,我希望建议能够向后兼容,因为我不想只进行简单的过滤 (WHERE) 查询,在某些情况下可能最终会返回零匹配。相反,我希望它做一些类似于使用文本相似性算法的事情,在这种算法中,您可以将一个短语与所有短语进行比较,并获得所有短语的比较分数,然后您可以根据该分数进行排序。

我对如何实现这一点感到非常困惑,在一种方法中,这不是定义 1000 条规则和 if/then 语句来生成 SQL 查询。有没有我可以使用的其他技术,或者除了 postgresql 之外的其他技术?

最佳答案

计算每个数值属性的加权偏差:

deviation = abs(actual_value- expected_value)* property_weight

对文本属性应用简化计算:

deviation = (actual_value <> expected_value)::int* property_weight

按照偏差总和的升序推荐位置。

例子。我们正在寻找 2012 年的 Renault Twingo Hatchback,里程数 50000,价格 6000:

select *, 
abs(price- 6000)* 100+
abs(year- 2012)* 10000+
abs(mileage- 50000)* 1+
(body_type <> 'Hatchback')::int* 40000+
(brand <> 'Renault')::int* 100000+
(model <> 'Twingo')::int* 50000
as recommendation
from cars
order by recommendation
limit 10;

id | price | year | mileage | fuel_type | body_type | brand | model | recommendation
----+-------+------+---------+-----------+-----------+---------+----------+----------------
9 | 5990 | 2013 | 38777 | Benzin | Hatchback | Renault | Twingo | 22223
8 | 5890 | 2013 | 35732 | Benzin | Hatchback | Renault | Twingo | 35268
7 | 5290 | 2013 | 58300 | Benzin | Hatchback | Renault | Twingo | 89300
4 | 5290 | 2013 | 20967 | Benzin | Hatchback | Renault | Twingo | 110033
3 | 4990 | 2013 | 55105 | Benzin | Hatchback | Renault | Twingo | 116105
2 | 4989 | 2012 | 33901 | Benzin | Hatchback | Renault | Twingo | 117199
12 | 6790 | 2012 | 49541 | Benzin | Hatchback | Renault | Clio III | 129459
13 | 6790 | 2012 | 46377 | Benzin | Hatchback | Renault | Clio III | 132623
14 | 6790 | 2012 | 45200 | Benzin | Hatchback | Renault | Clio III | 133800
20 | 7290 | 2013 | 52800 | Benzin | Hatchback | Renault | Twingo | 141800
(10 rows)

您可以通过更改属性的权重轻松地校准算法。

要获得更复杂的文本属性近似值,您可以将数值分配给辅助表中的属性,如下所示:

create table models(id serial primary key, model text, value integer);
insert into models (model, value) values
('Twingo', 10),
('Clio III', 11), -- Clio is more similar to Twingo than to Laguna
('Laguna', 18)
--- etc

并在主查询中将这些值用作数字属性,例如:

select cars.*, 
abs(price- 6000)* 100+
abs(year- 2012)* 10000+
abs(mileage- 50000)* 1+
(body_type <> 'Hatchback')::int* 40000+
(brand <> 'Renault')::int* 100000+
abs(models.value- 10)* 50000 -- 10 is a numeric value for Twingo
as recommendation
from cars
join models using(model)
order by recommendation
limit 10;

关于优化的说明。如果您可以严格定义任何属性的范围,请将其放在 WHERE 子句中以获得更好的性能。例如,如果查询不能返回所需品牌以外的品牌,那么计算此属性的偏差就没有意义:

select *, 
abs(price- 6000)* 100+
abs(year- 2012)* 10000+
abs(mileage- 50000)* 1+
(body_type <> 'Hatchback')::int* 40000+
(model <> 'Twingo')::int* 50000
as recommendation
from cars
where brand = 'Renault' -- !
order by recommendation
limit 10;

关于ruby - 如何在 postgresql 中创建多列推荐引擎?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43340625/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com