gpt4 book ai didi

sql - 并行插入到单个表的最快方法

转载 作者:行者123 更新时间:2023-12-02 07:10:29 24 4
gpt4 key购买 nike

我的公司被共生伙伴关系变成了寄生关系所诅咒。为了从寄生虫中获取数据,我们必须使用极其缓慢的 odbc 连接。我最近确实注意到,通过并行运行查询(即使在同一张表上)可以获得更多吞吐量。

有一个特别大的表,我想从中提取数据并将其移动到我们的本地表中。并行运行查询可以更快地获取数据,但我也认为这可能会导致尝试将多个查询的数据一次写入同一个表时出现问题。

关于如何最好地处理这种情况,您能给我什么建议,以便我可以利用并行使用查询所提高的速度?

编辑:我在这里得到了一些很好的反馈,但我认为我并不完全清楚我正在通过链接服务器(使用 odbc 驱动程序)提取数据。换句话说,这意味着我可以运行普通的 INSERT 语句,并且我相信这将提供比 SqlBulkCopy 或 BULK INSERT 更好的性能(实际上,我不认为 BULK INSERT 甚至是一个选项)。

最佳答案

你读过Load 1TB in less than 1 hour吗? ?

  1. Run as many load processes as you have available CPUs. If you have 32 CPUs, run 32 parallel loads. If you have 8 CPUs, run 8 parallel loads.
  2. If you have control over the creation of your input files, make them of a size that is evenly divisible by the number of load threads you want to run in parallel. Also make sure all records belong to one partition if you want to use the switch partition strategy.
  3. Use BULK insert instead of BCP if you are running the process on the SQL Server machine.
  4. Use table partitioning to gain another 8-10%, but only if your input files are GUARANTEED to match your partitioning function, meaning that all records in one file must be in the same partition.
  5. Use TABLOCK to avoid row at a time locking.
  6. Use ROWS PER BATCH = 2500, or something near this if you are importing multiple streams into one table.

对于 SQL Server 2008,在某些情况下您可以使用 minimal logging for a standard INSERT SELECT :

SQL Server 2008 enhances the methods that it can handle with minimal logging. It supports minimally logged regular INSERT SELECT statements. In addition, turning on trace flag 610 lets SQL Server 2008 support minimal logging against a nonempty B-tree for new key ranges that cause allocations of new pages.

关于sql - 并行插入到单个表的最快方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11093744/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com