gpt4 book ai didi

postgresql - pgloader - PostgreSQL 的快速数据加载

转载 作者:行者123 更新时间:2023-11-29 11:47:48 24 4
gpt4 key购买 nike

我想加速将数据加载到 PostgreSQL。我开始使用 pgloader https://github.com/dimitri/pgloader并想利用并行加载。我正在修改不同的参数,但我无法在我的机器上激活两个以上的内核(其中有 32 个)。我找到了文档 https://github.com/dimitri/pgloader/blob/master/pgloader.1.md并尝试设置那里描述的批处理选项。目前,我有这些设置:

 LOAD CSV
FROM '/home/data1_1.csv'
--FROM 'data/data.csv'
INTO postgresql://:postgres@localhost:5432/test?test

WITH truncate,
skip header = 0,
fields optionally enclosed by '"',
fields escaped by double-quote,
fields terminated by ',',
batch rows = 100,
batch size = 1MB,
batch concurrency = 64

SET client_encoding to 'utf-8',
work_mem to '10000MB',
maintenance_work_mem to '20000 MB'

最佳答案

我也遇到了这个问题,pgloader 似乎还不支持使用您提到的 batch 选项进行并行加载。这有点令人困惑,但是official documentation解释这些设置是关于内存管理的,而不是并行性:

batch concurrency
Takes a numeric value as argument, defaults to 10. That's the number of batches that pgloader is allows to build in memory, even when only a single batch at a time might be sent to PostgreSQL.

Supporting more than a single batch being sent at a time is on the TODO list of pgloader, but is not implemented yet. This option is about controlling the memory needs of pgloader as a trade-off to the performances characteristics, and not about parallel activity of pgloader.

关于postgresql - pgloader - PostgreSQL 的快速数据加载,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27937842/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com