Pages

Men

rh

10/16/2015

.NET Bulk Insert into SQL Server

Problem
Knowing about the SqlBulkCopy class is most helpful if you need to load large amounts of data into your database from within a .NET application. For example you may need to integrate a .NET custom step in order to do the initial data import in a Visual Studio application installer. Thus you benefit from the advanced error handling offered by the .NET platform. Another situation when SqlBulkCopy is useful is the data migration to SQL Azure. In this tip I'll walk you through a simple console application which will demonstrate the basic usage of this class and some options you should be aware of.
Solution
The SqlBulkCopy class functionality is similar to the BULK INSERT statement and to the bcp utility with the "in" argument. As always, when choosing a tool to work with it's useful to know its limitations. You can only use SqlBulkCopy to import data into SQL tables from any source which can be loaded into a DataTable or read with an IDataReader - for example you can import data directly from a SqlDataReader
The sample application I'll talk about uses as a csv source file with about 42000 lines containing product price data. The data will be loaded into a DataTable, processed and finally bulk inserted into a SQL table. The underlying database (let's call it simply TestBC) uses the bulk logged recovery model and contains the Product_bc table in which we'll bulk insert the data from the csv file. For the purposes of this example Product_bc is a heap.
CREATE TABLE [dbo].[Product_bc](
    [Item] [varchar](50) NOT NULL,
    [ProductID] [int] NOT NULL,
    [ListPrice] [money] NULL,
    [SalePrice] [money] NULL
) ON [PRIMARY]
Please see the Next Steps section for a link to download the Visual Studio project.
The first step is to load the csv file data into a DataTable. The below code creates the DataTable (notice that the appropriate data type is used for each column), reads the data from the csv file using a Microsoft.Jet.OLEDB.4.0 connection string and fills the DataTable. The configurable pieces of information, such as connection strings and file paths, are stored into the App.config file.
public static DataTable LoadCSVFile(){
    string selectFromFile = ConfigurationManager.AppSettings["selectFromFile"];
    DataTable dt = new DataTable();
    using (OleDbConnection jetConn = 
    new OleDbConnection(ConfigurationManager.AppSettings["JETConn"]))
    {
        using (OleDbCommand oleCmd = new OleDbCommand(selectFromFile, jetConn)){
            using(OleDbDataAdapter oleAdapter = new OleDbDataAdapter(oleCmd)){
                dt.Columns.Add("Item", System.Type.GetType("System.String"));
                dt.Columns.Add("ProductID", System.Type.GetType("System.Int32"));
                dt.Columns.Add("ListPrice", System.Type.GetType("System.Decimal"));
                dt.Columns.Add("SalePrice", System.Type.GetType("System.Decimal"));
                oleAdapter.Fill(dt);
            }
        }
    }
    return dt;
}
To bulk insert the DataTable's content into the SQL table I'm using the SqlBulkCopy class, as you can see in the next piece of code. There are a few things to pay attention to here. First of all, notice the constructor of the SqlBulkCopy class, which includes the TableLock option. As you most probably already know, there are a few conditions necessary to make the bulk import a minimally logged operation. One of them is to specify table locking - that is what the TableLock option does. The DestinationTableName property is held in the App.config file. In our case the data is copied into the Product_bc table. I use the SqlRowsCopied event in conjunction with the NotifyAfter property to write a message to the console after every 1000 rows have been copied. The WriteToServer method will copy the DataTable's content to the destination table.
public static void CopyData(string connStr, DataTable dt)
    {
        using (SqlBulkCopy bulkCopy = 
        new SqlBulkCopy(connStr, SqlBulkCopyOptions.TableLock)){
            bulkCopy.DestinationTableName = 
            ConfigurationManager.AppSettings["DestinationTable"];
            bulkCopy.SqlRowsCopied += 
            new SqlRowsCopiedEventHandler(bulkCopy_SqlRowsCopied);
            bulkCopy.NotifyAfter = 1000;
            bulkCopy.WriteToServer(dt);
        }
    }
To see that the table lock is acquired, I've set up a server side trace containing the Sql:BatchStarting and SqlBatchCompleted events. The application has 2 bulk copy methods, one of them ("CopyData") specifies the table locking, the other ("CopyData_NoTableLock") not. Here is the trace result in the first case:
using the sql bilk copy class
Prior to the bulk import SQL Server checks the table structure by issuing a SELECT with SET FMTONLY ON. It also checks the table and column collation executing the sys.sp_tablecollations_100 procedure. As you can see, the bulk import is done WITH TABLOCK. In the second case, the trace result looks like below:
sql server will check the table stucture
As you can see, the duration is higher when no table lock is held - 503ms versus 419ms with table lock.
Furthermore, you can check if the bulk insertion is minimally logged or not following these steps. First take a backup of the database, so that it will be under the "bulk logged" recovery model, not under the "pseudo-simple" one. Run the CopyData method (this method uses table lock) and run this afterwards:
--obtaining a list of page ids
DECLARE @dbid int = DB_ID(), @objid int = OBJECT_ID('dbo.Product_bc')
DBCC EXTENTINFO(@dbid, @objid)
--choose a page id (on my machine for example 8520) 
--and see the details of that page; 
DBCC TRACEON(3604)
DBCC PAGE(@dbid, 1, 8520, 3)

The Allocation status section of the page detail should mention "MIN_LOGGED":
the allocation status section
If you run the other method (CopyData_NoTableLock) the corresponding page detail will look like "NOT MIN_LOGGED":
make sure to review the sqlbulkcopy options
Make sure that you review the other SqlBulkCopy options enumerated here.
This example features a very simple case, in which the number of columns and their order is the same, both in the csv file and in the table. If the data source "does not match" the destination table, i.e. if the number of the columns and/or their ordinal position is not the same, the SqlBulkCopy.ColumnMappings property lets you specify the column mapping. Remember the "format file" you use with bcp or with BULK INSERT - this property fulfills a part of the "format file" job. Another property you may want to be aware of is the BatchSizeProperty.

source collected from mssqltips.com

No comments :

Post a Comment