Reading stuctured files into SQL Server Part 2

by Bjørn Bouet Smith 14. September 2010 20:05

My last post presented how you can read a file in a structured format into memory for further processing.

This post will focus on how you easily can transport the contents you just imported into SQL server.

If you want to data in bulk into SQL Server, then the most efficient way of doing that is to use the class System.Data.SqlClient.SqlBulkCopy.

There are two ways you can use SqlBulkCopy, either you give it a DataTable instance with the data represented in the same format and order as the table in the database, or you give it an IDataReader instance, that provides access to the data in the same format as the DataTable would do.

Both methods work just fine, but if you want high performance and efficiency you should not use a DataTable since it will require you to build up a DataTable object, transform your data into a row format, which is inefficient. The most efficient way is to implement an IDataReader on top of your data that you want to import. Naturally if you had to implement your IDataReader instance yourself, then the DataTable approach would probably be faster, since its very easy to understand and most people have used a DataTable before. But lets say you want to insert 1billion rows, then you face the issue that your DataTable simply cannot hold 1billion rows, so you would have to create several instances of a DataTable with chunks of data, which would use up a lot of memory anyway, and furthermore create a lot of objects that would have to be collected by the garbage collector.

By using an IDataReader you only have to provide one row at a time to the SqlbulkCopy class, and you can easily re-use your internal row representation for each instance of the row - this makes it very efficient both in terms of performance since you create less objects, and move less data into memory at the same time. Furthermore the fewer objects you create causes less garbage collection to happen, which is good, since the entire application grinds to a halt each time the garbage collector kicks in.

Now less words and more code, I have created a few classes that help with the IDataReader implementation that I have made.

 

  • FileDataColumn - A class that is used to describe the format in the record you try to load into the IDataReader.
  • FileDataRecord - An IDataRecord implementation with the possibility to also set the values of the record, not only read data from it.
  • FileDataReader - An IDataReader implementation that uses the FileRecordReader from my last post to provide forward only access to each record as an IDataRecord.

 

 

The FileDataColumn class only contains two properties. ColumnName and ColumnType, which is kind of obvious what they are used to, so I will not go into any detail on that class.

The FileDataReader takes a few arguments in its constructor that will enable it to read the data and provide a nice interface to it.

 

/// <summary>
/// Initializes a new instance of the <see cref="FileDataReader"/> class.
/// </summary>
/// <param name="fileStream">The file stream.</param>
/// <param name="columns">The columns describing the format of the stream for a single record.</param>
/// <param name="recordSeparator">The record separator.</param>
/// <param name="fieldSeparator">The field separator.</param>
/// <param name="fileEncoding">The file encoding.</param>
/// <param name="recordManipulator">The record manipulator.</param>
public FileDataReader(Stream fileStream, 
FileDataColumn[] columns, 
char recordSeparator, 
char fieldSeparator, 
Encoding fileEncoding,
Action<FileDataRecord> recordManipulator)

 

First argument is the stream where the data is located. In real world scenarios this would be a FileStream variant that would point to the file you want to read - this filestream will be passed onto the FileRecordReader instance that the constructor creates.

Second argument is an array of FileDataColumn objects that describes the record format of the file. They must be in the same order as the fields in the file.

Third argument is the record separator character, i.e. the character that separates the records from each other in the file.

Fourth argument is the field separator character, i.e. the character that separates the fields in the file.

Fifth argument is the encoding of the file, which is important in particular if you want to read text.

Last argument is an action that will be called before each call to Read returns, which will give you an opportunity to modify the data before its being passed onto whatever reads from the reader.

You use the FileDataReader as you would use any other IDataReader, by invoking the Read() Method that will return a bool indicating whether or not the reader was positioned at the next record or not.

i.e. 

 

IDataReader dataReader = new FileDataReader(s, cols, '\n', ',', Encoding.Unicode);

while (dataReader.Read())
{
    string fieldValue = (string)dataReader["field"];
    int fieldValue2 = (int)dataReader[2];
}

And so forth - the beauty of it is that if you do not want to do any processing you can just give the SqlBulkCopy the instance of the FileDataReader and you don't have to do any more work what so ever.

If you need to manipulate each record, you simply provide an Action to the FileDataReader i.e.

Stream s = new MemoryStream(1000);
for (int x = 0; x < 10; x++)
{
    AddRecordToStream(s, string.Format("{0}\n", (x * 10)));
}
s.Position = 0;
FileDataColumn[] cols = new[] 
{ 
    new FileDataColumn { ColumnName = "First", ColumnType = typeof(int) } 
};

IDataReader dataReader = new FileDataReader(
    s,
    cols,
    '\n',
    ';',
    Encoding.Unicode,
    record =>
    {

        int currentValue = record.GetInt32(0);
        record.SetValue(0, currentValue * 2);
    });

for (int x = 0; x < 10; x++)
{
    dataReader.Read();
    Assert.That(dataReader[0], Is.EqualTo(x * 10 * 2), x.ToString());

}

Nice and easy if you ask me Laughing - naturally you could easily extend and improve my FileDataReader implementation, but this will give you a hint on how you efficiently can read a file into SQL Server if you need to.

To use this reader together with SqlBulkCopy you simply create an instance of the FileDataReader and use it like below:

 

using (SqlBulkCopy bulkCopy =
                new SqlBulkCopy(destinationConnection))
{
    bulkCopy.DestinationTableName =
        "dbo.DestinationTable";

    try
    {
        bulkCopy.WriteToServer(reader);
    }
    catch (Exception ex)
    {
        Console.WriteLine(ex.Message);
    }
    finally
    {
        reader.Close();
    }
}

 

I have attached the entire source code project for both this post and the previous one, including integration tests that will show how to use the code.

I hope you enjoy using it, I certainly enjoyed writing the code.

Any questions, post a comment or leave feedback.

FileDataReader.zip (14.44 kb)

kick it on DotNetKicks.com

Tags: , , ,

.NET | c# | SQL Server

Comments (1) -

9/14/2010 9:15:37 PM #

trackback

Reading stuctured files into SQL Server Part 2

You've been kicked (a good thing) - Trackback from DotNetKicks.com

DotNetKicks.com | Reply

Pingbacks and trackbacks (1)+

Add comment




  Country flag
biuquote
  • Comment
  • Preview
Loading


About me

Even though I have been working with programming for 15 years now, I still get amazed of how little I know :)

That is one of the great things in computers, there are always someone better than you. Someone you can ask for help.

Follow me on twitter

Ads