gpt4 book ai didi

PostgreSQL:通过 LATERAL 连接关联的值总和

转载 作者:行者123 更新时间:2023-11-29 11:35:49 25 4
gpt4 key购买 nike

我正在尝试清理 PostgreSQL 表中的数据,其中一些记录在 email_address 列中有大量脏话(结果,这些记录是由激动的用户输入的由于已修复的错误而感到沮丧):

    ┌───────────────────┐    │   email_address   │    ├───────────────────┤    │ foo@go.bar.me.net │    │ foo@foo.com       │    │ foo@example.com   │    │ baz@example.com   │    │ barred@qux.com    │    └───────────────────┘

Desired query output

I'd like to build a query that annotates each row from the data table with a profanity score, and orders the records by the score, so that a human can go through the annotated data (presented in a web app) and take necessary action:

    ┌───────────────────┬───────┐    │ email_address     │ score │    ├───────────────────┼───────┤    │ foo@foo.com       │    18 │    │ foo@go.bar.me.net │    14 │    │ foo@example.com   │     9 │    │ baz@example.com   │     3 │    │ barred@qux.com    │     0 │    └───────────────────┴───────┘

Attempt #1

The approach I'm taking is to build a list of regular expressions (now I have 2 problems...) and scores, whereby very profane words will contribute a large profanity score if that word is found in the email_address column. My profanities table looks something like this:

    ┌──────────────────┬───────┐    │ profanity_regexp │ score │    ├──────────────────┼───────┤    │ foo              │     9 │    │ bar(?!red)       │     5 │    │ baz              │     3 │    └──────────────────┴───────┘

LATERAL JOIN

I've found that I can use a LATERAL join over the regexp_matches function to extract all profanities from each email_address (but records with no profanities are discarded):

SELECT
data.email_address,
array_agg(matches)
FROM
data,
profanities p,
LATERAL regexp_matches(data.email_address, p.posix_regexp, 'gi') matches
GROUP BY
data.email_address;

这会产生以下结果:

    ┌───────────────────┬───────────────────┐    │   email_address   │ profanities_found │    ├───────────────────┼───────────────────┤    │ foo@foo.com       │ {{foo},{foo}}     │    │ foo@example.com   │ {{foo}}           │    │ foo@go.bar.me.net │ {{foo},{bar}}     │    │ baz@example.com   │ {{baz}}           │    └───────────────────┴───────────────────┘

SUB-SELECT

I also figured out how to get an array of profanity score subtotals for each record with this SQL:

SELECT
data.email_address,
array(
SELECT score * (
SELECT COUNT(*)
FROM (SELECT
regexp_matches(data.email_address, p.posix_regexp, 'gi')
) matches
)
FROM profanities p
) prof
from data;

它正确地产生所有行(包括没有亵渎的行):

    ┌───────────────────┬──────────┐    │   email_address   │   prof   │    ├───────────────────┼──────────┤    │ foo@go.bar.me.net │ {9,5,0}  │    │ foo@foo.com       │ {18,0,0} │    │ foo@example.com   │ {9,0,0}  │    │ baz@example.com   │ {0,0,3}  │    │ barred@qux.com    │ {0,0,0}  │    └───────────────────┴──────────┘

问题

如何对横向连接的结果求和以获得所需的输出?

我可以使用另一种策略来获得所需的结果吗?


我已经在 http://sqlfiddle.com/#!17/6685c/4 上发布了这个问题的实时代码 fiddle 。

最佳答案

向您的查询添加另一个选择。当前查询没问题,但您只需要对数组求和即可。

SELECT email_address,
(
SELECT SUM(s)
FROM
UNNEST(prof.profanity_score_subtotals) s
) AS sum_prof FROM (
SELECT
data.email_address,
array(
SELECT score * (
SELECT COUNT(*)
FROM (SELECT
regexp_matches(data.email_address, p.profanity_regexp, 'gi')
) matches
)
FROM profanities p
) profanity_score_subtotals
FROM data
) prof;

关于PostgreSQL:通过 LATERAL 连接关联的值总和,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49146516/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com