gpt4 book ai didi

sql - 复杂用户匹配算法性能

转载 作者:行者123 更新时间:2023-11-29 12:24:24 26 4
gpt4 key购买 nike

我正在为拥有超过 400,000 名用户的应用程序编写算法。

表格

User <- holds users and their latitude, longitude, and the last time they were active.
Swipes <- when a user has already seen someone, a record is inserted in here.

我的匹配算法应该获取与请求用户一定距离内的用户,这些用户以前没有见过,并且还应该获取活跃用户和一段时间内不活跃用户的组合。

我已尽力将实现尽可能多地记录下来,以便于理解。这是我当前的实现:

SELECT id,
distance
FROM
(

-- This is done so that the users that returned can be numbered 1 through x partitioned by the buckets and ordered by the distance away.
SELECT id,
distance,
row_number() OVER (PARTITION BY buckets
ORDER BY distance) AS bucket_interval

FROM
(

-- This inner query will fetch all of the users and create a column called "buckets" that will separate users depending on how active they are
SELECT id,


-- This is the "buckets" column
CASE
WHEN now() - last_active_at < interval '1' DAY THEN 1
WHEN now() - last_active_at < interval '2' DAY THEN 2
WHEN now() - last_active_at < interval '5' DAY THEN 3
WHEN now() - last_active_at < interval '10' DAY THEN 4
ELSE 5
END AS buckets,


-- This column gets the distance of the current user to the other users
3958.755864232 * 2 * ASIN(SQRT(POWER(SIN((51.6900092 - users.latitude) * PI() / 180 / 2), 2) + COS(51.6900092 * PI() / 180) * COS(users.latitude * PI() / 180) * POWER(SIN((-8.14594 - users.longitude) * PI() / 180 / 2), 2))) AS distance,

FROM "users"


-- This first condition will make sure people are within the desired distance (0 to 100 miles away in this case)

WHERE (users.latitude BETWEEN -31.15177391077796 AND 134.31179231344798
AND users.longitude BETWEEN -252.66535853758625 AND 163.78387854758624
AND (3958.755864232 * 2 * ASIN(SQRT(POWER(SIN((51.6900092 - users.latitude) * PI() / 180 / 2), 2) + COS(51.6900092 * PI() / 180) * COS(users.latitude * PI() / 180) * POWER(SIN((-8.14594 - users.longitude) * PI() / 180 / 2), 2)))) BETWEEN 0.0 AND 100)


-- This second condition will make sure the current user hasn't swiped through this person already

AND users.id NOT IN
(SELECT "swipes"."connection_id"
FROM "swipes"
WHERE user_id = currentUserId)
) x
) xx

-- This is done because I only want to fetch 50 matches at a time. Since there are "5" buckets... each of them will have 10 people ordered by distance
WHERE bucket_interval <= 10
ORDER BY distance ASC
LIMIT 50

这是在 ruby​​ on rails 中实现的,我使用的是一个名为 geocoder 的 gem以获得实际距离和所有这些。

但是,ruby 实现并不重要,重要的是如何编写这个查询才能尽可能优化。

它的运行速度并不快。

谢谢

编辑:这是查询计划

    [
{
"Plan": {
"Startup Cost": 80238.35,
"Plans": [
{
"Startup Cost": 80238.35,
"Plans": [
{
"Startup Cost": 80238.24,
"Plans": [
{
"Startup Cost": 80238.24,
"Plans": [
{
"Startup Cost": 80238.24,
"Plans": [
{
"Filter": "(((NOT archived) OR (archived IS NULL)) AND (NOT is_suspended) AND (image_urls IS NOT NULL) AND (latitude >= (-90.151773910848)::double precision) AND (latitude <= 199.311792310848::double precision) AND (longitude >= (-255.665358277586)::double precision) AND (longitude <= 243.783878277586::double precision) AND (NOT (hashed SubPlan 1)) AND (NOT (hashed SubPlan 2)) AND ((lower((gender)::text) = 'f'::text) OR (lower((gender)::text) = 'female'::text)) AND ((7917.511728464::double precision * asin(sqrt((power(sin(((((54.5800092::double precision - latitude) * 3.14159265358979::double precision) / 180::double precision) / 2::double precision)), 2::double precision) + ((0.579565539469435::double precision * cos(((latitude * 3.14159265358979::double precision) / 180::double precision))) * power(sin((((((-44.9774)::double precision - longitude) * 3.14159265358979::double precision) / 180::double precision) / 2::double precision)), 2::double precision)))))) >= 0::double precision) AND ((7917.511728464::double precision * asin(sqrt((power(sin(((((54.5800092::double precision - latitude) * 3.14159265358979::double precision) / 180::double precision) / 2::double precision)), 2::double precision) + ((0.579565539469435::double precision * cos(((latitude * 3.14159265358979::double precision) / 180::double precision))) * power(sin((((((-44.94074)::double precision - longitude) * 3.14159265358979::double precision) / 180::double precision) / 2::double precision)), 2::double precision)))))) <= 10000::double precision))",
"Startup Cost": 1722.06,
"Plans": [
{
"Startup Cost": 0.11,
"Scan Direction": "Forward",
"Plan Width": 4,
"Node Type": "Index Scan",
"Index Cond": "(user_id = 231415)",
"Plan Rows": 1955,
"Relation Name": "meets",
"Alias": "meets",
"Parent Relationship": "SubPlan",
"Total Cost": 1713.44,
"Subplan Name": "SubPlan 2",
"Index Name": "index_meets_on_user_id"
}
],
"Node Type": "Seq Scan",
"Plan Rows": 4,
"Relation Name": "users",
"Alias": "users",
"Parent Relationship": "Outer",
"Plan Width": 28,
"Total Cost": 80238.23
}
],
"Sort Key": [
"(CASE WHEN ((now() - (users.last_active_at)::timestamp with time zone) < '1 day'::interval day) THEN 1 WHEN ((now() - (users.last_active_at)::timestamp with time zone) < '2 days'::interval day) THEN 2 WHEN ((now() - (users.last_active_at)::timestamp with time zone) < '5 days'::interval day) THEN 3 WHEN ((now() - (users.last_active_at)::timestamp with time zone) < '10 days'::interval day) THEN 4 ELSE 5 END)",
"((7917.511728464::double precision * asin(sqrt((power(sin(((((54.5800092::double precision - users.latitude) * 3.14159265358979::double precision) / 180::double precision) / 2::double precision)), 2::double precision) + ((0.579565539469435::double precision * cos(((users.latitude * 3.14159265358979::double precision) / 180::double precision))) * power(sin((((((-5.94074)::double precision - users.longitude) * 3.14159265358979::double precision) / 180::double precision) / 2::double precision)), 2::double precision)))))))"
],
"Plan Rows": 4,
"Node Type": "Sort",
"Parent Relationship": "Outer",
"Plan Width": 28,
"Total Cost": 80238.24
}
],
"Plan Rows": 4,
"Node Type": "WindowAgg",
"Parent Relationship": "Subquery",
"Plan Width": 28,
"Total Cost": 80238.33
}
],
"Node Type": "Subquery Scan",
"Plan Rows": 1,
"Filter": "(xx.bucket_interval <= 10)",
"Alias": "xx",
"Parent Relationship": "Outer",
"Plan Width": 12,
"Total Cost": 80238.35
}
],
"Sort Key": [
"xx.distance"
],
"Plan Rows": 1,
"Node Type": "Sort",
"Parent Relationship": "Outer",
"Plan Width": 12,
"Total Cost": 80238.35
}
],
"Plan Rows": 1,
"Node Type": "Limit",
"Plan Width": 12,
"Total Cost": 80238.35
}
}
]

最佳答案

要尝试的一件事是替换您的“NOT IN”声明,因为这些声明是 extremely slow .

代替

FROM "users"
...
AND users.id NOT IN
(SELECT "swipes"."connection_id"
FROM "swipes"
WHERE user_id = currentUserId)

使用

FROM "users" u
LEFT JOIN "swipes" s
ON s.user_id = currentUserId

...

WHERE s.user_id IS NULL

关于sql - 复杂用户匹配算法性能,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48550748/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com