sql - (PostgreSQL : How to supply all missing pairs?-6ren

sql - (PostgreSQL : How to supply all missing pairs?

转载作者：行者123 更新时间：2023-11-29 14:34:54

给定一个包含成对的“因素”和一个存在标志的表:

create table pairs (
  factor_1  text,
  factor_2  text,
  exists    boolean
  );

和以下数据(可读性分隔符):

 factor_1 | factor_2 | exists
----------+------------------
foo       | one      | t
foo       | two      | t
-----------------------------
bar       | three    | t
-----------------------------
baz       | four     | t
baz       | five     | t

我怎样才能创建一个 View 来显示所有可能的对一组给定的因素:

 factor_1 | factor_2 | exists
----------+------------------
foo       | one      | t
foo       | two      | t
foo       | three    | f
foo       | four     | f
foo       | five     | f
-----------------------------
bar       | one      | f
bar       | two      | f
bar       | three    | t
bar       | four     | f
bar       | five     | f
-----------------------------
baz       | one      | f
baz       | two      | f
baz       | three    | f
baz       | four     | t
baz       | five     | t

我想可以定义一个包含所有factor_1 的不同值，另一个包含所有不同值的factor_2，然后取叉积并将所有的 exists 设置为 true在表 pairs 中找到的对。有没有更优雅/高效/惯用的实现相同的方法？

编辑解决方案的讨论:

在提出问题和得到两个答案之间的短时间内为此，我去实现了我在上面记下的解决方案。这是它看起来像什么；它有 3 个 CTE 和一个隐式交叉连接:

with
  p1 as ( select distinct factor_1 from pairs  ),
  p2 as ( select distinct factor_2 from pairs  ),
  p3 as ( select *                 from p1, p2 )
  select
      p3.factor_1 as factor_1,
      p3.factor_2 as factor_2,
      ( case when p.exists then true else false end ) as exists
    from p3
    left join pairs as p on ( p3.factor_1 = p.factor_1 and p3.factor_2 = p.factor_2 )
    order by p3.factor_1, p3.factor_2;

现在让我们将其与答案进行比较。我做了一些重新格式化并重命名为使所有解决方案仅在重要的地方有所不同。

Gordon Linoff 的解决方案 A 相当短，并且没有 CTE:

select
    f1.factor_1                 as factor_1,
    f2.factor_2                 as factor_2,
    coalesce( p.exists, false ) as exists
  from        ( select distinct factor_1 from pairs ) as p1
  cross join  ( select distinct factor_2 from pairs ) as p2
  left  join  pairs p
    on p.factor_1 = p1.factor_1 and p.factor_2 = p2.factor_2
    order by p1.factor_1, p2.factor_2;

Valli 的解决方案 B 甚至更短；它的洞察力是它的组合从交叉连接中什么应该是唯一的，所以 distinct 关键字可能被分解出来到顶部选择:

select distinct
    p1.factor_1                 as factor_1,
    p2.factor_2                 as factor_2,
    coalesce( p.exists, false ) as exists
  from        pairs as p1
  cross join  pairs as p2
  left  join  pairs as p
    on p1.factor_1 = p.factor_1 and p2.factor_2 = p.factor_2
    order by p1.factor_1, p2.factor_2;

我在这里担心的是，数据库规划器必须更加努力地工作，以防止交叉连接被膨胀太多的重复对然后被过滤掉。所以我对所有三个解决方案都做了explain analyze(注意:我删除了order by子句)；事实证明，结果有些矛盾。我的解决方案由于 CTE，具有 CTE 的产品会受到不利影响。我确实在我的 SQL 中经常使用它们，因为它们非常方便，但是它们也被称为 PostgreSQL 中的优化孤岛(类似于单独的 View )，它显示了这一点。

                                                       QUERY PLAN                                                        
-------------------------------------------------------------------------------------------------------------------------
 Merge Left Join  (cost=4770.47..5085.69 rows=40000 width=65) (actual time=0.167..0.189 rows=15 loops=1)
   Merge Cond: ((v3.factor_1 = p.factor_1) AND (v3.factor_2 = p.factor_2))
   CTE v1
     ->  HashAggregate  (cost=20.88..22.88 rows=200 width=32) (actual time=0.026..0.028 rows=3 loops=1)
           Group Key: pairs.factor_1
           ->  Seq Scan on pairs  (cost=0.00..18.70 rows=870 width=32) (actual time=0.010..0.012 rows=5 loops=1)
   CTE v2
     ->  HashAggregate  (cost=20.88..22.88 rows=200 width=32) (actual time=0.011..0.012 rows=5 loops=1)
           Group Key: pairs_1.factor_2
           ->  Seq Scan on pairs pairs_1  (cost=0.00..18.70 rows=870 width=32) (actual time=0.003..0.005 rows=5 loops=1)
   CTE v3
     ->  Nested Loop  (cost=0.00..806.00 rows=40000 width=64) (actual time=0.044..0.062 rows=15 loops=1)
           ->  CTE Scan on v1  (cost=0.00..4.00 rows=200 width=32) (actual time=0.028..0.030 rows=3 loops=1)
           ->  CTE Scan on v2  (cost=0.00..4.00 rows=200 width=32) (actual time=0.005..0.007 rows=5 loops=3)
   ->  Sort  (cost=3857.54..3957.54 rows=40000 width=64) (actual time=0.118..0.123 rows=15 loops=1)
         Sort Key: v3.factor_1, v3.factor_2
         Sort Method: quicksort  Memory: 25kB
         ->  CTE Scan on v3  (cost=0.00..800.00 rows=40000 width=64) (actual time=0.046..0.074 rows=15 loops=1)
   ->  Sort  (cost=61.18..63.35 rows=870 width=65) (actual time=0.042..0.042 rows=5 loops=1)
         Sort Key: p.factor_1, p.factor_2
         Sort Method: quicksort  Memory: 25kB
         ->  Seq Scan on pairs p  (cost=0.00..18.70 rows=870 width=65) (actual time=0.005..0.008 rows=5 loops=1)
 Planning time: 0.368 ms
 Execution time: 0.421 ms
(24 rows)

观察这个计划中有两个sort。

解决方案 A 的计划要短得多(而且执行时间出奇地长):

                                                               QUERY PLAN                                                                
-----------------------------------------------------------------------------------------------------------------------------------------
 Hash Right Join  (cost=1580.25..2499.00 rows=40000 width=65) (actual time=1.048..2.197 rows=15 loops=1)
   Hash Cond: ((p.factor_1 = pairs.factor_1) AND (p.factor_2 = pairs_1.factor_2))
   ->  Seq Scan on pairs p  (cost=0.00..18.70 rows=870 width=65) (actual time=0.010..0.015 rows=5 loops=1)
   ->  Hash  (cost=550.25..550.25 rows=40000 width=64) (actual time=0.649..0.649 rows=15 loops=1)
         Buckets: 65536  Batches: 2  Memory Usage: 513kB
         ->  Nested Loop  (cost=41.75..550.25 rows=40000 width=64) (actual time=0.058..0.077 rows=15 loops=1)
               ->  HashAggregate  (cost=20.88..22.88 rows=200 width=32) (actual time=0.033..0.036 rows=3 loops=1)
                     Group Key: pairs.factor_1
                     ->  Seq Scan on pairs  (cost=0.00..18.70 rows=870 width=32) (actual time=0.017..0.018 rows=5 loops=1)
               ->  Materialize  (cost=20.88..25.88 rows=200 width=32) (actual time=0.008..0.011 rows=5 loops=3)
                     ->  HashAggregate  (cost=20.88..22.88 rows=200 width=32) (actual time=0.013..0.016 rows=5 loops=1)
                           Group Key: pairs_1.factor_2
                           ->  Seq Scan on pairs pairs_1  (cost=0.00..18.70 rows=870 width=32) (actual time=0.004..0.006 rows=5 loops=1)
 Planning time: 0.258 ms
 Execution time: 2.342 ms
(15 rows)

解决方案 B 的执行计划比解决方案 A 的执行计划长得多，具有多个隐式 sort:

                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=282354.48..289923.48 rows=80000 width=65) (actual time=0.230..0.251 rows=15 loops=1)
   ->  Sort  (cost=282354.48..284246.73 rows=756900 width=65) (actual time=0.229..0.233 rows=25 loops=1)
         Sort Key: p1.factor_1, p2.factor_2, (COALESCE(p."exists", false))
         Sort Method: quicksort  Memory: 26kB
         ->  Merge Left Join  (cost=140389.32..146354.17 rows=756900 width=65) (actual time=0.122..0.157 rows=25 loops=1)
               Merge Cond: ((p1.factor_1 = p.factor_1) AND (p2.factor_2 = p.factor_2))
               ->  Sort  (cost=140328.14..142220.39 rows=756900 width=64) (actual time=0.095..0.100 rows=25 loops=1)
                     Sort Key: p1.factor_1, p2.factor_2
                     Sort Method: quicksort  Memory: 26kB
                     ->  Nested Loop  (cost=0.00..9500.83 rows=756900 width=64) (actual time=0.027..0.043 rows=25 loops=1)
                           ->  Seq Scan on pairs p1  (cost=0.00..18.70 rows=870 width=32) (actual time=0.010..0.011 rows=5 loops=1)
                           ->  Materialize  (cost=0.00..23.05 rows=870 width=32) (actual time=0.003..0.005 rows=5 loops=5)
                                 ->  Seq Scan on pairs p2  (cost=0.00..18.70 rows=870 width=32) (actual time=0.005..0.008 rows=5 loops=1)
               ->  Sort  (cost=61.18..63.35 rows=870 width=65) (actual time=0.021..0.023 rows=8 loops=1)
                     Sort Key: p.factor_1, p.factor_2
                     Sort Method: quicksort  Memory: 25kB
                     ->  Seq Scan on pairs p  (cost=0.00..18.70 rows=870 width=65) (actual time=0.004..0.004 rows=5 loops=1)
 Planning time: 0.260 ms
 Execution time: 0.333 ms
(19 rows)

我想我们可以忘记这个没有索引的短样本的执行时间；只有使用真实数据，我们才能确定地告诉他们。

基于这些结果，我更喜欢Gordon Linoff的解决方案A，原因是它的SQL形式比较短，而执行计划是最简洁的。我有点担心解决方案 B 的执行计划中出现性能不佳的机会，我的猜测也是，虽然将 distinct 子句分解到最上层是优雅，它不一定是最精确的表达方式——我不想对唯一对进行交叉连接和过滤，我想对唯一值进行交叉连接。不用说，如果执行时间关系(A:2.3 毫秒/B:0.3 毫秒)应该以实际数据量显示出来——那将改变我的决定。

最佳答案

使用cross join获取行，使用left join获取 bool 表达式:

select f1.factor_1, f2.factor_2, coalesce(p.exists, false) as exists
from (select distinct factor_1 from pairs) f1 cross join
     (select distinct factor_2 from pairs) f2 left join
     pairs p
     on p.factor_1 = f1.factor_1 and p.factor_2 = f2.factor_2;

注意:虽然 Postgres 接受 exists 作为列别名，但我认为这是一个错误的名称，因为它与 SQL 关键字冲突。

关于sql - (PostgreSQL : How to supply all missing pairs?，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46575102/

文章推荐： android - 使用 firebase 手机身份验证验证失败

文章推荐： sql - 将自动增量值添加到与另一列中的重复值相关的列

文章推荐： sql - 如何在 postgres 中的 id 字段上实现一个简单的主键

文章推荐： windows - Windows 7 上可执行文件的 MySQLDump

R dplyr 用第一个非 "missing"值替换 -"missing"列数据
要在标题(或谷歌)中简洁地描述这是一个棘手的问题。我有一个分类表，其中某些列可能会根据置信度列为“已删除”。我想用“未识别”替换任何显示“已删除”的列，后跟第一列中未识别的值以行方式说“掉落”。因此，
python - 你如何修复 "Missing module docstringpylint(missing-module-docstring)"
我在 VSCode 上使用 pygame 模块，但遇到了 pygame 没有 init 成员的问题。我遵循了 this 的解决方案关联。我编辑了用户设置并添加了 "python.linting
ios - 格洛格 : configure: WARNING: 'missing' script is too old or missing
我的问题是如何解决丢失的脚本太旧或丢失!! checking for a BSD-compatible install... /usr/bin/install -c checking whether
java - Spring 启动器 : Missing Bean instead of missing value
我正在使用带有启动器的 Spring Boot。当我错误配置启动器(缺少或定义了错误的值)时，它会打印“缺少 bean”错误消息，而不是“缺少值”。很难找到这个错误。我的开胃菜看起来像 @Condi
Django 操作错误 : missing table; migration does not recognize missing table
我在 Django 1.7 中遇到问题，我正在尝试将用户保存到表中，但我收到一个错误，指出该表不存在。这是我正在执行的代码: from django.conf import settings fro
java - Ehcache中的 "cache misses"和 "in memory cache misses"有什么区别？
我正在查看 EhCache 统计数据，我看到了这些数字: CacheMisses: 75977 CacheHits: 38151 InMemoryCacheMisses: 4843 InMemoryC
r - na.fail.default 中的错误 : missing values in object - but no missing values
我正在尝试使用这些数据运行 lme 模型: tot_nochc=runif(10,1,15) cor_partner=factor(c(1,1,0,1,0,0,0,0,1,0)) age=runif(
c++ - 在另一台计算机上运行 .exe 文件时出现 "Missing MSCVP140.dll"和 "Missing VCRUNTIME140.dll"
我在 Microsoft Visual Studio C++ 中编写了一个程序，并为此使用了 SFML。我包含了程序所需的正确的 .dll 文件，并将它们复制到“发布”文件夹中。有效。整个程序在我的电
Getting console error "Uncaught SyntaxError: missing ) after argument list"(在参数列表之后获取控制台错误“unauCaptSynaxError：Missing)”)
在设置新的Reaction CSR应用程序、一些样板库等过程中。在控制台中收到以下错误：。现在，我不会去修复一些我没有维护的包。我怎么才能找到真正的问题呢？Vite dev Build没有报告错误。
javascript - 流 JavaScript "Missing type annotation for T"和 "Missing type annotation for S"
我正在上 React Native 类(class)，然后使用 Flow 尝试纠正类(class)中的错误，因为讲师没有使用任何类型检查。我在 Flow 中遇到了另一个错误，通过在互联网上进行长时间
javascript - 取出图片标签 alt :missing. "image tag without an alt id is prefered and not showing missing"
我想删除图像标签正在寻找的缺失错误。我不想要 ult 标签占位符，试图故意将其保留为空白，直到我使用回形针浏览上传照片。我已经将 url(:missing) 更改为许多其他内容，例如 nil 等。是
SQL 错误 : ORA-00906: missing left parenthesis 00906. 00000 - "missing left parenthesis"
CREATE TABLE customer(customer_id NUMBER(6) PRIMARY KEY , customer_name VARCHAR2(40) NOT NULL , cust
node.js - reCAPTCHA - 验证用户响应时的错误代码 : 'missing-input-response' , 'missing-input-secret'(缺少 POST 详细信息)
我正在设置 invisible reCAPTCHA在我的 Web 应用程序中并且无法验证用户的响应。 (即使我传递了正确的 POST 参数) 我通过调用 grecaptcha.execute(); 以
c# - 使用 Office PIA 时出现 System.Type.Missing 或 System.Reflection.Missing.Value？
我搜索了 these SO results找不到与我的问题相关的任何内容。我怀疑这可能是重复的。我目前正在 .NET C# 3.5 中编写 Microsoft.Office.Interop.Exce
c++ - 错误 C4430 : missing type specifier/error C2143: syntax error : missing ';' before '*'
我在同一行收到两个错误。 Bridge *在 Lan 类中排名第一。我错过了什么？ #include #include #include using namespace std; class L
c++ - C2143 : syntax error: missing ';' before '*' & C4430: missing type specifier - int assumed. 注意:C++不支持default-int
首先，我看到了一些解决方案，但我没有理解它们。我是 QT 的新手，甚至谷歌也没有帮助我。英语不是我的母语这是在QT Creator 5.6中调试后的报错信息 C2143: syntax error:
missing-data - 从基本记录生成记录序列
有没有办法把表1展开成表2？就是将start_no和end_no之间的每一个整数作为seq_no字段输出，取原表的其他字段组成新表(表2)。表 1: date source market
Excel旭日图: Some labels missing
我在 Excel (2016) 中制作了一个旭日形图，并希望为所有数据点添加标签。问题是，Excel 会自动丢弃一些标签: 似乎标签被删除是因为数据点太小或标签字符串太长。如何让 Excel 显示所有
带有变量名的 R missing()
在 R 3.0.2 中，missing() 函数可以告诉我们是否缺少形式参数。如何避免硬编码传递给丢失的变量名称？例如在 demoargs <- function(a=3, b=2, d) {
返回按钮时出现参数错误后的 Javascript:missing )
我试图在 UI 上的某些功能中返回一个按钮，但出现了一个奇怪的错误。有人可以帮忙吗？ var div = "View" 我得到的错误是: 参数列表后缺少 )。最佳答案 onclick="javas

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

sql - (PostgreSQL : How to supply all missing pairs?