gpt4 book ai didi

c# - 批量插入最好的办法是什么? + 帮助我完全理解我目前的发现

转载 作者:太空狗 更新时间:2023-10-29 20:37:55 29 4
gpt4 key购买 nike

所以我在这里看到了这篇文章并阅读了它,似乎批量复制可能是要走的路。

What’s the best way to bulk database inserts from c#?

我仍然有一些问题,想知道实际情况如何。

所以我找到了2个教程。

http://www.codeproject.com/KB/cs/MultipleInsertsIn1dbTrip.aspx#_Toc196622241

http://www.codeproject.com/KB/linq/BulkOperations_LinqToSQL.aspx

第一种方式使用 2 个 ado.net 2.0 功能。批量插入和批量复制。第二个使用 linq to sql 和 OpenXML。

这种类型对我很有吸引力,因为我已经在使用 linq to sql 并且比 ado.net 更喜欢它。然而,正如一个人在帖子中指出的那样,他只是以性能为代价来解决这个问题(我认为这没有错)

首先我会讲第一个教程中的2种方式

我正在使用 VS2010 Express(为了测试我使用 VS2008 的教程,但不确定我刚刚加载了示例文件并运行了哪个 .net 版本)、.net 4.0、MVC 2.0、SQl Server 2005

  • ado.net 2.0 是最新版本吗?
  • 基于我正在使用的技术,我将要展示的内容是否有一些更新会以某种方式改进它?
  • 这些教程有什么我应该知道的东西吗?

  • 批量插入

    我在所有示例中都使用此表。
    CREATE TABLE [dbo].[TBL_TEST_TEST]
    (
    ID INT IDENTITY(1,1) PRIMARY KEY,
    [NAME] [varchar](50)
    )

    SP代码
    USE [Test]
    GO
    /****** Object: StoredProcedure [dbo].[sp_BatchInsert] Script Date: 05/19/2010 15:12:47 ******/
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    ALTER PROCEDURE [dbo].[sp_BatchInsert] (@Name VARCHAR(50) )
    AS
    BEGIN
    INSERT INTO TBL_TEST_TEST VALUES (@Name);
    END

    C# 代码
    /// <summary>
    /// Another ado.net 2.0 way that uses a stored procedure to do a bulk insert.
    /// Seems slower then "BatchBulkCopy" way and it crashes when you try to insert 500,000 records in one go.
    /// http://www.codeproject.com/KB/cs/MultipleInsertsIn1dbTrip.aspx#_Toc196622241
    /// </summary>
    private static void BatchInsert()
    {
    // Get the DataTable with Rows State as RowState.Added
    DataTable dtInsertRows = GetDataTable();

    SqlConnection connection = new SqlConnection(connectionString);
    SqlCommand command = new SqlCommand("sp_BatchInsert", connection);
    command.CommandType = CommandType.StoredProcedure;
    command.UpdatedRowSource = UpdateRowSource.None;

    // Set the Parameter with appropriate Source Column Name
    command.Parameters.Add("@Name", SqlDbType.VarChar, 50, dtInsertRows.Columns[0].ColumnName);

    SqlDataAdapter adpt = new SqlDataAdapter();
    adpt.InsertCommand = command;
    // Specify the number of records to be Inserted/Updated in one go. Default is 1.
    adpt.UpdateBatchSize = 1000;

    connection.Open();
    int recordsInserted = adpt.Update(dtInsertRows);
    connection.Close();
    }

    所以第一件事是批量大小。为什么除了要发送的记录数之外,还要将批量大小设置为任何值?就像我发送了 500,000 条记录,所以我做了 500,000 的批量大小。

    接下来为什么当我这样做时它会崩溃?如果我将批量大小设置为 1000,它就可以正常工作。
    System.Data.SqlClient.SqlException was unhandled
    Message="A transport-level error has occurred when sending the request to the server. (provider: Shared Memory Provider, error: 0 - No process is on the other end of the pipe.)"
    Source=".Net SqlClient Data Provider"
    ErrorCode=-2146232060
    Class=20
    LineNumber=0
    Number=233
    Server=""
    State=0
    StackTrace:
    at System.Data.Common.DbDataAdapter.UpdatedRowStatusErrors(RowUpdatedEventArgs rowUpdatedEvent, BatchCommandInfo[] batchCommands, Int32 commandCount)
    at System.Data.Common.DbDataAdapter.UpdatedRowStatus(RowUpdatedEventArgs rowUpdatedEvent, BatchCommandInfo[] batchCommands, Int32 commandCount)
    at System.Data.Common.DbDataAdapter.Update(DataRow[] dataRows, DataTableMapping tableMapping)
    at System.Data.Common.DbDataAdapter.UpdateFromDataTable(DataTable dataTable, DataTableMapping tableMapping)
    at System.Data.Common.DbDataAdapter.Update(DataTable dataTable)
    at TestIQueryable.Program.BatchInsert() in C:\Users\a\Downloads\TestIQueryable\TestIQueryable\TestIQueryable\Program.cs:line 124
    at TestIQueryable.Program.Main(String[] args) in C:\Users\a\Downloads\TestIQueryable\TestIQueryable\TestIQueryable\Program.cs:line 16
    InnerException:

    插入批量大小为 1000 的 500,000 条记录花费的时间 “2 分 54 秒”

    当然,这不是我拿着秒表坐在那里的正式时间(我相信有更好的方法,但懒得看他们在哪里)

    所以我发现与我所有其他的相比有点慢(期待 linq 到 sql 插入一个),我不确定为什么。

    接下来我查看了批量复制
    /// <summary>
    /// An ado.net 2.0 way to mass insert records. This seems to be the fastest.
    /// http://www.codeproject.com/KB/cs/MultipleInsertsIn1dbTrip.aspx#_Toc196622241
    /// </summary>
    private static void BatchBulkCopy()
    {
    // Get the DataTable
    DataTable dtInsertRows = GetDataTable();

    using (SqlBulkCopy sbc = new SqlBulkCopy(connectionString, SqlBulkCopyOptions.KeepIdentity))
    {
    sbc.DestinationTableName = "TBL_TEST_TEST";

    // Number of records to be processed in one go
    sbc.BatchSize = 500000;

    // Map the Source Column from DataTabel to the Destination Columns in SQL Server 2005 Person Table
    // sbc.ColumnMappings.Add("ID", "ID");
    sbc.ColumnMappings.Add("NAME", "NAME");

    // Number of records after which client has to be notified about its status
    sbc.NotifyAfter = dtInsertRows.Rows.Count;

    // Event that gets fired when NotifyAfter number of records are processed.
    sbc.SqlRowsCopied += new SqlRowsCopiedEventHandler(sbc_SqlRowsCopied);

    // Finally write to server
    sbc.WriteToServer(dtInsertRows);
    sbc.Close();
    }

    }

    这个似乎运行得非常快,甚至不需要 SP(您可以将 SP 与批量复制一起使用吗?如果可以的话会更好吗?)

    BatchCopy 对 500,000 批大小没有问题。那么,为什么要让它小于您要发送的记录数?

    我发现使用 BatchCopy 和 500,000 批量大小只需要 5 秒 去完成。然后我尝试将批量大小设为 1,000,结果只花了 8 秒 .

    比上面的bulkinsert快得多。

    现在我尝试了其他教程。
    USE [Test]
    GO
    /****** Object: StoredProcedure [dbo].[spTEST_InsertXMLTEST_TEST] Script Date: 05/19/2010 15:39:03 ******/
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    ALTER PROCEDURE [dbo].[spTEST_InsertXMLTEST_TEST](@UpdatedProdData nText)
    AS
    DECLARE @hDoc int

    exec sp_xml_preparedocument @hDoc OUTPUT,@UpdatedProdData

    INSERT INTO TBL_TEST_TEST(NAME)
    SELECT XMLProdTable.NAME
    FROM OPENXML(@hDoc, 'ArrayOfTBL_TEST_TEST/TBL_TEST_TEST', 2)
    WITH (
    ID Int,
    NAME varchar(100)
    ) XMLProdTable

    EXEC sp_xml_removedocument @hDoc

    C# 代码。
    /// <summary>
    /// This is using linq to sql to make the table objects.
    /// It is then serailzed to to an xml document and sent to a stored proedure
    /// that then does a bulk insert(I think with OpenXML)
    /// http://www.codeproject.com/KB/linq/BulkOperations_LinqToSQL.aspx
    /// </summary>
    private static void LinqInsertXMLBatch()
    {
    using (TestDataContext db = new TestDataContext())
    {
    TBL_TEST_TEST[] testRecords = new TBL_TEST_TEST[500000];
    for (int count = 0; count < 500000; count++)
    {
    TBL_TEST_TEST testRecord = new TBL_TEST_TEST();
    testRecord.NAME = "Name : " + count;
    testRecords[count] = testRecord;
    }

    StringBuilder sBuilder = new StringBuilder();
    System.IO.StringWriter sWriter = new System.IO.StringWriter(sBuilder);
    XmlSerializer serializer = new XmlSerializer(typeof(TBL_TEST_TEST[]));
    serializer.Serialize(sWriter, testRecords);
    db.insertTestData(sBuilder.ToString());
    }
    }

    所以我喜欢这个,因为我可以使用对象,即使它有点多余。我不明白 SP 是如何工作的。就像我不明白整个事情一样。我不知道 OPENXML 是否有一些批量插入功能,但我什至不知道如何使用这个示例 SP 并将其更改为适合我的表,因为就像我说的那样,我不知道发生了什么。

    我也不知道如果对象中有更多表会发生什么。就像说我有一个 ProductName 表,它与 Product 表或类似的东西有关系。

    在 linq to sql 中,您可以获取产品名称对象并对同一对象中的 Product 表进行更改。所以我不确定如何考虑这一点。我不确定我是否需要单独插入或做什么。

    500,000 条记录的时间相当不错 52 秒

    当然,最后一种方法只是使用 linq 来完成这一切,而且非常糟糕。
    /// <summary>
    /// This is using linq to sql to to insert lots of records.
    /// This way is slow as it uses no mass insert.
    /// Only tried to insert 50,000 records as I did not want to sit around till it did 500,000 records.
    /// http://www.codeproject.com/KB/linq/BulkOperations_LinqToSQL.aspx
    /// </summary>
    private static void LinqInsertAll()
    {
    using (TestDataContext db = new TestDataContext())
    {
    db.CommandTimeout = 600;
    for (int count = 0; count < 50000; count++)
    {
    TBL_TEST_TEST testRecord = new TBL_TEST_TEST();
    testRecord.NAME = "Name : " + count;
    db.TBL_TEST_TESTs.InsertOnSubmit(testRecord);
    }
    db.SubmitChanges();
    }
    }

    我只做了 50,000 条记录,这花了一分钟多的时间。

    所以我真的把它缩小到 linq to sql 批量插入方式或批量复制。当您以任何一种方式建立关系时,我只是不确定该怎么做。我不确定他们在进行更新而不是插入时如何站起来,因为我还没有尝试过。

    我认为我永远不需要以一种类型插入/更新超过 50,000 条记录,但同时我知道我必须在插入之前对记录进行验证,这样会减慢它的速度,这会使 linq sql 更好作为您的对象,特别是如果您在插入数据库之前首先从 xml 文件解析数据。

    完整的 C# 代码
    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Xml.Serialization;
    using System.Data;
    using System.Data.SqlClient;

    namespace TestIQueryable
    {
    class Program
    {
    private static string connectionString = "";
    static void Main(string[] args)
    {
    BatchInsert();
    Console.WriteLine("done");
    }

    /// <summary>
    /// This is using linq to sql to to insert lots of records.
    /// This way is slow as it uses no mass insert.
    /// Only tried to insert 50,000 records as I did not want to sit around till it did 500,000 records.
    /// http://www.codeproject.com/KB/linq/BulkOperations_LinqToSQL.aspx
    /// </summary>
    private static void LinqInsertAll()
    {
    using (TestDataContext db = new TestDataContext())
    {
    db.CommandTimeout = 600;
    for (int count = 0; count < 50000; count++)
    {
    TBL_TEST_TEST testRecord = new TBL_TEST_TEST();
    testRecord.NAME = "Name : " + count;
    db.TBL_TEST_TESTs.InsertOnSubmit(testRecord);
    }
    db.SubmitChanges();
    }
    }

    /// <summary>
    /// This is using linq to sql to make the table objects.
    /// It is then serailzed to to an xml document and sent to a stored proedure
    /// that then does a bulk insert(I think with OpenXML)
    /// http://www.codeproject.com/KB/linq/BulkOperations_LinqToSQL.aspx
    /// </summary>
    private static void LinqInsertXMLBatch()
    {
    using (TestDataContext db = new TestDataContext())
    {
    TBL_TEST_TEST[] testRecords = new TBL_TEST_TEST[500000];
    for (int count = 0; count < 500000; count++)
    {
    TBL_TEST_TEST testRecord = new TBL_TEST_TEST();
    testRecord.NAME = "Name : " + count;
    testRecords[count] = testRecord;
    }

    StringBuilder sBuilder = new StringBuilder();
    System.IO.StringWriter sWriter = new System.IO.StringWriter(sBuilder);
    XmlSerializer serializer = new XmlSerializer(typeof(TBL_TEST_TEST[]));
    serializer.Serialize(sWriter, testRecords);
    db.insertTestData(sBuilder.ToString());
    }
    }

    /// <summary>
    /// An ado.net 2.0 way to mass insert records. This seems to be the fastest.
    /// http://www.codeproject.com/KB/cs/MultipleInsertsIn1dbTrip.aspx#_Toc196622241
    /// </summary>
    private static void BatchBulkCopy()
    {
    // Get the DataTable
    DataTable dtInsertRows = GetDataTable();

    using (SqlBulkCopy sbc = new SqlBulkCopy(connectionString, SqlBulkCopyOptions.KeepIdentity))
    {
    sbc.DestinationTableName = "TBL_TEST_TEST";

    // Number of records to be processed in one go
    sbc.BatchSize = 500000;

    // Map the Source Column from DataTabel to the Destination Columns in SQL Server 2005 Person Table
    // sbc.ColumnMappings.Add("ID", "ID");
    sbc.ColumnMappings.Add("NAME", "NAME");

    // Number of records after which client has to be notified about its status
    sbc.NotifyAfter = dtInsertRows.Rows.Count;

    // Event that gets fired when NotifyAfter number of records are processed.
    sbc.SqlRowsCopied += new SqlRowsCopiedEventHandler(sbc_SqlRowsCopied);

    // Finally write to server
    sbc.WriteToServer(dtInsertRows);
    sbc.Close();
    }

    }


    /// <summary>
    /// Another ado.net 2.0 way that uses a stored procedure to do a bulk insert.
    /// Seems slower then "BatchBulkCopy" way and it crashes when you try to insert 500,000 records in one go.
    /// http://www.codeproject.com/KB/cs/MultipleInsertsIn1dbTrip.aspx#_Toc196622241
    /// </summary>
    private static void BatchInsert()
    {
    // Get the DataTable with Rows State as RowState.Added
    DataTable dtInsertRows = GetDataTable();

    SqlConnection connection = new SqlConnection(connectionString);
    SqlCommand command = new SqlCommand("sp_BatchInsert", connection);
    command.CommandType = CommandType.StoredProcedure;
    command.UpdatedRowSource = UpdateRowSource.None;

    // Set the Parameter with appropriate Source Column Name
    command.Parameters.Add("@Name", SqlDbType.VarChar, 50, dtInsertRows.Columns[0].ColumnName);

    SqlDataAdapter adpt = new SqlDataAdapter();
    adpt.InsertCommand = command;
    // Specify the number of records to be Inserted/Updated in one go. Default is 1.
    adpt.UpdateBatchSize = 500000;

    connection.Open();
    int recordsInserted = adpt.Update(dtInsertRows);
    connection.Close();
    }



    private static DataTable GetDataTable()
    {
    // You First need a DataTable and have all the insert values in it
    DataTable dtInsertRows = new DataTable();
    dtInsertRows.Columns.Add("NAME");

    for (int i = 0; i < 500000; i++)
    {
    DataRow drInsertRow = dtInsertRows.NewRow();
    string name = "Name : " + i;
    drInsertRow["NAME"] = name;
    dtInsertRows.Rows.Add(drInsertRow);


    }
    return dtInsertRows;

    }


    static void sbc_SqlRowsCopied(object sender, SqlRowsCopiedEventArgs e)
    {
    Console.WriteLine("Number of records affected : " + e.RowsCopied.ToString());
    }


    }
    }

    最佳答案

    批量大小是为了减少网络延迟的影响。它不需要超过几千。多条语句收集在一起并作为一个单元发送,因此您每 N 条语句获得一次网络旅行的命中率,而不是每条语句一次。

    关于c# - 批量插入最好的办法是什么? + 帮助我完全理解我目前的发现,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/2870022/

    29 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com