gpt4 book ai didi

json - 向 Postgresql 插入大量数据

转载 作者:行者123 更新时间:2023-11-29 13:21:58 24 4
gpt4 key购买 nike

我在向 PostgreSQL 数据库中插入数百万行时遇到了性能问题。

我正在发送一个 JSON 对象,它有一个包含数百万行的数组。

对于每一行,我在数据库表中创建一条记录。我也尝试过一次插入多个,但问题仍然存在。

我不确定如何处理这个,我读到 COPY 命令是最快的。

如何提高性能?

我的 JSON 对象将日志作为数组:数组日志有百万行。

{"type":"monitoring","log":[
["2016-10-12T20:33:21","0.00","0.00","0.00","0.00","0.0","24.00","1.83","-0.00","1","1","-100.00"],
["2016-10-12T20:33:23","0.00","0.00","0.00","0.00","0.0","24.00","1.52","-0.61","1","1","-100.00"]]}

我当前的代码(我正在构建一个动态语句以便我可以一次执行多行):

IF(NOT b_first_line) THEN
s_insert_query_values = right(s_insert_query_values, -1); --remove te leading comma

EXECUTE format('INSERT INTO log_rlda
(record_node_id, log_line, log_value, timestamp, record_log_id)
VALUES %s;', s_insert_query_values);

s_insert_query_values = '';
i_num_lines_buffered = 0;
END IF;
END IF;

s_insert_query_values 包含:

“log”中数组中的每个值都需要插入到它自己的行中(在列中:log_value)。这是 INSERT 的样子(引用 s_insert_query_values):

INSERT INTO log_rlda
(record_node_id, log_line, log_value, timestamp, record_log_id)
VALUES
(806, 1, 0.00, '2016-10-12 20:33:21', 386),
(807, 1, 0.00, '2016-10-12 20:33:21', 386),
(808, 1, 0.00, '2016-10-12 20:33:21', 386),
(809, 1, 0.00, '2016-10-12 20:33:21', 386),
(810, 1, 0.0, '2016-10-12 20:33:21', 386),
(811, 1, 24.00, '2016-10-12 20:33:21', 386),
(768, 1, 1.83, '2016-10-12 20:33:21', 386),
(769, 1, 0.00, '2016-10-12 20:33:21', 386),
(728, 1, 1, '2016-10-12 20:33:21', 386),
(771, 1, 1, '2016-10-12 20:33:21', 386),
(729, 1, -100.00, '2016-10-12 20:33:21', 386),
(806, 2, 0.00, '2016-10-12 20:33:23', 386),
(807, 2, 0.00, '2016-10-12 20:33:23', 386),
(808, 2, 0.00, '2016-10-12 20:33:23', 386),
(809, 2, 0.00, '2016-10-12 20:33:23', 386),
(810, 2, 0.0, '2016-10-12 20:33:23', 386),
(811, 2, 24.00, '2016-10-12 20:33:23', 386),
(768, 2, 1.52, '2016-10-12 20:33:23', 386),
(769, 2, -0.61, '2016-10-12 20:33:23', 386),
(728, 2, 1, '2016-10-12 20:33:23', 386),
(771, 2, 1, '2016-10-12 20:33:23', 386),
(729, 2, -100.00, '2016-10-12 20:33:23', 386)

解决方案(i_node_id_list 包含我在本次查询之前选择的 ID):

SELECT i_node_id_list[log_value_index] AS record_node_id,
e.log_line-1 AS log_line,
items.log_value::double precision as log_value,
to_timestamp((e.line->>0)::text, 'YYYY-MM-DD HH24:MI:SS') as "timestamp",
i_log_id as record_log_id
FROM (VALUES (log_data::json)) as data (doc),
json_array_elements(doc->'log') with ordinality as e(line, log_line),
json_array_elements_text(e.line) with ordinality as items(log_value, log_value_index)
WHERE log_value_index > 1 --dont include timestamp value (shouldnt be written as log_value)
AND log_line > 1

最佳答案

您需要两层取消嵌套。

select e.log_line, items.log_value, e.line -> 0 as timestamp
from (
values ('{"type":"monitoring","log":[
["2016-10-12T20:33:21","0.00","0.00","0.00","0.00","0.0","24.00","1.83","-0.00","1","1","-100.00"],
["2016-10-12T20:33:23","0.00","0.00","0.00","0.00","0.0","24.00","1.52","-0.61","1","1","-100.00"]]}'::json)
) as data (doc),
json_array_elements(doc->'log') with ordinality as e(line, log_line),
json_array_elements(e.line) with ordinality as items(log_value, log_value_index)
where log_value_index > 1;

第一次调用 json_array_elements() 从 log 属性中提取所有数组元素。 with ordinality 允许我们识别该数组中的每一行。然后,第二个调用从行中获取每个元素,with ordinality 再次允许我们找出数组中的位置。

上面的查询返回这个:

log_line | log_value | timestamp            
---------+-----------+----------------------
1 | "0.00" | "2016-10-12T20:33:21"
1 | "0.00" | "2016-10-12T20:33:21"
1 | "0.00" | "2016-10-12T20:33:21"
1 | "0.00" | "2016-10-12T20:33:21"
1 | "0.0" | "2016-10-12T20:33:21"
1 | "24.00" | "2016-10-12T20:33:21"
1 | "1.83" | "2016-10-12T20:33:21"
1 | "-0.00" | "2016-10-12T20:33:21"
1 | "1" | "2016-10-12T20:33:21"
1 | "1" | "2016-10-12T20:33:21"
1 | "-100.00" | "2016-10-12T20:33:21"
2 | "0.00" | "2016-10-12T20:33:23"
2 | "0.00" | "2016-10-12T20:33:23"
2 | "0.00" | "2016-10-12T20:33:23"
2 | "0.00" | "2016-10-12T20:33:23"
2 | "0.0" | "2016-10-12T20:33:23"
2 | "24.00" | "2016-10-12T20:33:23"
2 | "1.52" | "2016-10-12T20:33:23"
2 | "-0.61" | "2016-10-12T20:33:23"
2 | "1" | "2016-10-12T20:33:23"
2 | "1" | "2016-10-12T20:33:23"
2 | "-100.00" | "2016-10-12T20:33:23"

然后可以使用上述语句的结果直接插入数据而无需循环。这应该比做很多单独的插入要快得多。

不过,我不确定如何将正确的 record_node_idrecord_log_id 集成到上述结果中。

关于json - 向 Postgresql 插入大量数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40171985/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com