gpt4 book ai didi

json - PostgreSQL:有没有办法提高使用 JSONB 或 HSTORE 键的 SELECT 查询的性能?

转载 作者:行者123 更新时间:2023-11-29 13:21:04 25 4
gpt4 key购买 nike

我有一个包含许多行(数百万)的大表,其中有一列类型为 JSONB/HSTORE,其中包含许多字段(数百个)。为了便于说明,我使用了以下较小且不太复杂的表格:

-- table with HSTORE column
CREATE TABLE test_hstore (id BIGSERIAL PRIMARY KEY, data HSTORE);
INSERT INTO test_hstore (data)
SELECT hstore(
' key_1=>' || trunc(2 * random()) ||
', key_2=>' || trunc(2 * random()) ||
', key_3=>' || trunc(2 * random()))
FROM generate_series(0, 9999999) i;

-- table with JSONB column
CREATE TABLE test_jsonb (id BIGSERIAL PRIMARY KEY, data JSONB);
INSERT INTO test_jsonb (data)
SELECT (
'{ "key_1":' || trunc(2 * random()) ||
', "key_2":' || trunc(2 * random()) ||
', "key_3":' || trunc(2 * random()) || '}')::JSONB
FROM generate_series(0, 9999999) i;

我想简单地SELECT data 列中的一个或多个字段,而不使用WHERE 子句。随着所选字段数量的增加,我的性能有所下降:

EXPLAIN ANALYSE
SELECT id FROM test_hstore;
--Seq Scan on test_hstore (cost=0.00..213637.56 rows=10000056 width=8) (actual time=0.049..3705.852 rows=10000000 loops=1)
--Planning time: 0.419 ms
--Execution time: 5445.654 ms

EXPLAIN ANALYSE
SELECT data FROM test_hstore;
--Seq Scan on test_hstore (cost=0.00..213637.56 rows=10000056 width=56) (actual time=0.083..2424.334 rows=10000000 loops=1)
--Planning time: 0.082 ms
--Execution time: 3856.972 ms

EXPLAIN ANALYSE
SELECT data->'key_1' FROM test_hstore;
--Seq Scan on test_hstore (cost=0.00..238637.70 rows=10000056 width=32) (actual time=0.122..3263.937 rows=10000000 loops=1)
--Planning time: 0.052 ms
--Execution time: 5390.803 ms


EXPLAIN ANALYSE
SELECT data->'key_1', data->'key_2' FROM test_hstore;
--Seq Scan on test_hstore (cost=0.00..263637.84 rows=10000056 width=64) (actual time=0.089..3621.768 rows=10000000 loops=1)
--Planning time: 0.051 ms
--Execution time: 5334.452 ms

EXPLAIN ANALYSE
SELECT data->'key_1', data->'key_2', data->'key_3' FROM test_hstore;
--Seq Scan on test_hstore (cost=0.00..288637.98 rows=10000056 width=96) (actual time=0.086..4291.111 rows=10000000 loops=1)
--Planning time: 0.067 ms
--Execution time: 6375.229 ms

JSONB 列类型的趋势相同(甚至更明显):

EXPLAIN ANALYSE
SELECT id FROM test_jsonb;
--Seq Scan on test_jsonb (cost=0.00..233332.28 rows=9999828 width=8) (actual time=0.028..4009.841 rows=10000000 loops=1)
--Planning time: 0.878 ms
--Execution time: 5867.604 ms

EXPLAIN ANALYSE
SELECT data FROM test_jsonb;
--Seq Scan on test_jsonb (cost=0.00..233332.28 rows=9999828 width=68) (actual time=0.074..2371.212 rows=10000000 loops=1)
--Planning time: 0.061 ms
--Execution time: 3787.308 ms

EXPLAIN ANALYSE
SELECT data->'key_1' FROM test_jsonb;
--Seq Scan on test_jsonb (cost=0.00..258331.85 rows=9999828 width=32) (actual time=0.106..4677.026 rows=10000000 loops=1)
--Planning time: 0.066 ms
--Execution time: 6382.469 ms

EXPLAIN ANALYSE
SELECT data->'key_1', data->'key_2' FROM test_jsonb;
--Seq Scan on test_jsonb (cost=0.00..283331.42 rows=9999828 width=64) (actual time=0.094..6888.904 rows=10000000 loops=1)
--Planning time: 0.047 ms
--Execution time: 8593.060 ms

EXPLAIN ANALYSE
SELECT data->'key_1', data->'key_2', data->'key_3' FROM test_jsonb;
--Seq Scan on test_jsonb (cost=0.00..308330.99 rows=9999828 width=96) (actual time=0.173..9567.699 rows=10000000 loops=1)
--Planning time: 0.171 ms
--Execution time: 11262.135 ms

当表格包含更多字段时,这会变得更加明显。有解决方法吗?

添加 GIN INDEX 似乎没有帮助:

CREATE INDEX ix_test_hstore ON test_hstore USING GIN (data);
EXPLAIN ANALYSE
SELECT data->'key_1', data->'key_2', data->'key_3' FROM test_hstore;
--Seq Scan on test_hstore (cost=0.00..288637.00 rows=10000000 width=96) (actual time=0.045..4650.447 rows=10000000 loops=1)
--Planning time: 2.100 ms
--Execution time: 6746.631 ms

CREATE INDEX ix_test_jsonb ON test_jsonb USING GIN (data);
EXPLAIN ANALYSE
SELECT data->'key_1', data->'key_2', data->'key_3' FROM test_jsonb;
--Seq Scan on test_jsonb (cost=0.00..308334.00 rows=10000000 width=96) (actual time=0.149..9807.012 rows=10000000 loops=1)
--Planning time: 0.131 ms
--Execution time: 11739.948 ms

最佳答案

实际上,要提高对一个 key 的访问,您无能为力。在数据存储中,或 property一段 JSON 数据(可能是数组,或字符串数字;这可能是检索它更容易的原因比从 hstore 中检索它更困难)。

如果您需要使用 data->key_1,索引可以帮助您在 WHERE 子句中,但它不会使从数据中检索属性变得更容易。

最好的做法是,如果您总是(或经常)使用某个 key_1 , 将规范化您的数据并创建一个名为 key_1 .如果您的数据源使您很容易存储 data , 但不太容易储存 key_1 ,您可以让触发函数注意(在插入或更新时)填充 column key_1来自 data 的值:

CREATE TABLE test_jsonb 
(
id BIGSERIAL PRIMARY KEY,
data JSONB,
key_1 integer
);

CREATE OR REPLACE FUNCTION ins_upd_test_data()
RETURNS trigger AS
$$
BEGIN
new.key_1 = (new.data->>'key_1')::integer ;
RETURN new ;
END ;
$$
LANGUAGE plpgsql VOLATILE LEAKPROOF;

CREATE TRIGGER ins_upd_test_jsonb_trigger
BEFORE INSERT OR UPDATE OF data
ON test_jsonb FOR EACH ROW
EXECUTE PROCEDURE ins_upd_test_data();

通过这种方式,您可以检索 key_1与检索 id 的效率相同.

关于json - PostgreSQL:有没有办法提高使用 JSONB 或 HSTORE 键的 SELECT 查询的性能?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41726189/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com