postgresql - 如何更有效地更新此表中的 13 亿行？-6ren

postgresql - 如何更有效地更新此表中的 13 亿行？

转载作者：行者123 更新时间：2023-11-29 13:47:30

我在 PostgreSQL 表中有 13 亿行 sku_comparison看起来像这样:

id1 (INTEGER) | id2 (INTEGER) | (10 SMALLINT columns) | length1 (SMALLINT)... |

... length2 (SMALLINT) | length_difference (SMALLINT)

id1和 id2列在名为 sku 的表中引用，其中包含大约 300,000 行，并且具有关联的 varchar(25)列中每一行的值，code .

有一个建立在id1上的btree索引和 id2 ，复合索引为 id1和 id2在 sku_comparison . id 上有一个 btree 索引sku 栏目，还有。

我的目标是更新 length1和 length2具有相应 code 长度的列来自 sku 的专栏 table 。但是，我运行了以下代码 20 多个小时，并没有完成更新:

UPDATE sku_comparison SET length1=length(sku.code) FROM sku 
WHERE sku_comparison.id1=sku.id;

所有数据都存储在本地计算机的单个硬盘上，处理器相当现代。构建这个表需要在 Python 中进行更复杂的字符串比较，只花了大约 30 个小时左右，所以我不确定为什么这样的事情会花这么长时间。

编辑:这里是格式化表定义:

                                     Table "public.sku"
   Column   |         Type          |                    Modifiers                     
------------+-----------------------+--------------------------------------------------
 id         | integer               | not null default nextval('sku_id_seq'::regclass)
 sku        | character varying(25) | 
 pattern    | character varying(25) | 
 pattern_an | character varying(25) | 
 firsttwo   | character(2)          | default '  '::bpchar
 reference  | character varying(25) | 
Indexes:
    "sku_pkey" PRIMARY KEY, btree (id)
    "sku_sku_idx" UNIQUE, btree (sku)
    "sku_firstwo_idx" btree (firsttwo)
Referenced by:
    TABLE "sku_comparison" CONSTRAINT "sku_comparison_id1_fkey" FOREIGN KEY (id1) REFERENCES sku(id)
    TABLE "sku_comparison" CONSTRAINT "sku_comparison_id2_fkey" FOREIGN KEY (id2) REFERENCES sku(id)


            Table "public.sku_comparison"
          Column           |   Type   |        Modifiers        
---------------------------+----------+-------------------------
 id1                       | integer  | not null
 id2                       | integer  | not null
 consec_charmatch          | smallint | 
 consec_groupmatch         | smallint | 
 consec_fieldtypematch     | smallint | 
 consec_groupmatch_an      | smallint | 
 consec_fieldtypematch_an  | smallint | 
 general_charmatch         | smallint | 
 general_groupmatch        | smallint | 
 general_fieldtypematch    | smallint | 
 general_groupmatch_an     | smallint | 
 general_fieldtypematch_an | smallint | 
 length1                   | smallint | default 0
 length2                   | smallint | default 0
 length_difference         | smallint | default '-999'::integer
Indexes:
    "sku_comparison_pkey" PRIMARY KEY, btree (id1, id2)
    "ssd_id1_idx" btree (id1)
    "ssd_id2_idx" btree (id2)
Foreign-key constraints:
    "sku_comparison_id1_fkey" FOREIGN KEY (id1) REFERENCES sku(id)
    "sku_comparison_id2_fkey" FOREIGN KEY (id2) REFERENCES sku(id)

最佳答案

您会考虑使用匿名代码块吗？

使用伪代码...

FOREACH 'SELECT ski.id, 
                sku.code, 
                length(sku.code) 
         FROM   sku 
         INTO   v_skuid, v_skucode, v_skulength'
DO 
 UPDATE sku_comparison 
 SET sku_comparison.length1 = v_skulength
 WHERE sku_comparison.id1=v_skuid;
END DO
END FOREACH

这会把整个事情分解成更小的交易，你不会每次都评估 sku.code 的长度。

关于postgresql - 如何更有效地更新此表中的 13 亿行？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46018747/