gpt4 book ai didi

Clickhouse 从 csv DB::NetException 导入数据:连接由对等点重置,同时写入套接字

转载 作者:行者123 更新时间:2023-12-04 15:45:12 32 4
gpt4 key购买 nike

我正在尝试通过以下方式将 *.gz 文件加载到 Clickhouse:
clickhouse-client --max_memory_usage=15323460608 --format_csv_delimiter="|"--query="插入 tmp1.my_test) 表格格式 CSV"

我收到错误:代码:210。DB::NetException: Connection reset by peer, while write to socket (127.0.0.1:9000) 。

clickhouse-server.log 、 clickhouse-server.err.log 或 zookeeper.log 中没有错误

当我运行插入命令时,我看到内存几乎达到服务器的限制(32Gb),这就是为什么我试图通过 max_memory_usage 来限制它,同样的错误

有任何想法吗?
提前致谢

最佳答案

问题可能是您在几天内对数据进行分区,并且您的批量插入 CSV 中有太多天数。尝试在您的表创建中删除 PARTITION BY toYYYYMMDD(business_ts) 规范。插入我的一张表时,我注意到了类似的问题。在添加 --max_memory_usage 参数之前,我遇到了与您在此处报告的完全相同的错误:代码:210。DB::NetException: Connection reset by peer, while write to socket (127.0.0.1:9000)
然后我添加了 --max_memory_usage=15000000000 并收到了一条更有用的错误消息:

Received exception from server (version 20.11.5):Code: 252. DB::Exception: Received from localhost:9000. DB::Exception: Too many partitions for single INSERT block (more than 100). The limit is controlled by 'max_partitions_per_insert_block' setting. Large number of partitions is a common misconception. It will lead to severe negative performance impact, including slow server startup, slow INSERT queries and slow SELECT queries. Recommended total number of partitions for a table is under 1000..10000. Please note, that partitioning is not intended to speed up SELECT queries (ORDER BY key is sufficient to make range queries fast). Partitions are intended for data manipulation (DROP PARTITION, etc)..


正如更有帮助的错误消息指出的那样,PARTITION 并不能帮助提高 SELECT 性能。它确实有助于更有效地促进非查询操作。我不知道这里用例的所有细节,但也许按 spin_ts 和 business_ts 进行 ORDER BY 并删除 business_ts 上的 PARTITION 可能是有意义的。

关于Clickhouse 从 csv DB::NetException 导入数据:连接由对等点重置,同时写入套接字,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56156682/

32 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com