Updates to the asyncronous memcached client

New updates is available for my memcached client.

 

  • Server monitoring is in place, i.e. if a server node goes down or several requests fail for a given node.  
  • Logging framework has been added, so useful log statements can be added.
Coming updates are:

  • Actually using the information added by the server monitor, to remove a node when it is marked as dead and reintroduce it again, if and when it is marked as alive again.
  • Implement Set - I don't know how I could forget this in the first version, but it's very simple to implement with the current implementation.
  • Implement MultiGet - so you can save a few precious roundtrips if you are lucky enough that all your keys end up on the same server node.
  • Implement stats operation - so you can get some usefull statistics back from the server.

 

Anyway, check it out at: http://asyncmemcached.codeplex.com

If anyone out there is actually using the client or considering it, please let me know, I would really like some feedback.

Reading stuctured files into SQL Server Part 2

My last post presented how you can read a file in a structured format into memory for further processing.

This post will focus on how you easily can transport the contents you just imported into SQL server.

If you want to data in bulk into SQL Server, then the most efficient way of doing that is to use the class System.Data.SqlClient.SqlBulkCopy.

There are two ways you can use SqlBulkCopy, either you give it a DataTable instance with the data represented in the same format and order as the table in the database, or you give it an IDataReader instance, that provides access to the data in the same format as the DataTable would do.

Both methods work just fine, but if you want high performance and efficiency you should not use a DataTable since it will require you to build up a DataTable object, transform your data into a row format, which is inefficient. The most efficient way is to implement an IDataReader on top of your data that you want to import. Naturally if you had to implement your IDataReader instance yourself, then the DataTable approach would probably be faster, since its very easy to understand and most people have used a DataTable before. But lets say you want to insert 1billion rows, then you face the issue that your DataTable simply cannot hold 1billion rows, so you would have to create several instances of a DataTable with chunks of data, which would use up a lot of memory anyway, and furthermore create a lot of objects that would have to be collected by the garbage collector.

By using an IDataReader you only have to provide one row at a time to the SqlbulkCopy class, and you can easily re-use your internal row representation for each instance of the row - this makes it very efficient both in terms of performance since you create less objects, and move less data into memory at the same time. Furthermore the fewer objects you create causes less garbage collection to happen, which is good, since the entire application grinds to a halt each time the garbage collector kicks in.

Now less words and more code, I have created a few classes that help with the IDataReader implementation that I have made.

 

  • FileDataColumn - A class that is used to describe the format in the record you try to load into the IDataReader.
  • FileDataRecord - An IDataRecord implementation with the possibility to also set the values of the record, not only read data from it.
  • FileDataReader - An IDataReader implementation that uses the FileRecordReader from my last post to provide forward only access to each record as an IDataRecord.

 

 

The FileDataColumn class only contains two properties. ColumnName and ColumnType, which is kind of obvious what they are used to, so I will not go into any detail on that class.

The FileDataReader takes a few arguments in its constructor that will enable it to read the data and provide a nice interface to it.

 

/// <summary>
/// Initializes a new instance of the <see cref="FileDataReader"/> class.
/// </summary>
/// <param name="fileStream">The file stream.</param>
/// <param name="columns">The columns describing the format of the stream for a single record.</param>
/// <param name="recordSeparator">The record separator.</param>
/// <param name="fieldSeparator">The field separator.</param>
/// <param name="fileEncoding">The file encoding.</param>
/// <param name="recordManipulator">The record manipulator.</param>
public FileDataReader(Stream fileStream, 
FileDataColumn[] columns, 
char recordSeparator, 
char fieldSeparator, 
Encoding fileEncoding,
Action<FileDataRecord> recordManipulator)

 

First argument is the stream where the data is located. In real world scenarios this would be a FileStream variant that would point to the file you want to read - this filestream will be passed onto the FileRecordReader instance that the constructor creates.

Second argument is an array of FileDataColumn objects that describes the record format of the file. They must be in the same order as the fields in the file.

Third argument is the record separator character, i.e. the character that separates the records from each other in the file.

Fourth argument is the field separator character, i.e. the character that separates the fields in the file.

Fifth argument is the encoding of the file, which is important in particular if you want to read text.

Last argument is an action that will be called before each call to Read returns, which will give you an opportunity to modify the data before its being passed onto whatever reads from the reader.

You use the FileDataReader as you would use any other IDataReader, by invoking the Read() Method that will return a bool indicating whether or not the reader was positioned at the next record or not.

i.e. 

 

IDataReader dataReader = new FileDataReader(s, cols, '\n', ',', Encoding.Unicode);

while (dataReader.Read())
{
    string fieldValue = (string)dataReader["field"];
    int fieldValue2 = (int)dataReader[2];
}

And so forth - the beauty of it is that if you do not want to do any processing you can just give the SqlBulkCopy the instance of the FileDataReader and you don't have to do any more work what so ever.

If you need to manipulate each record, you simply provide an Action to the FileDataReader i.e.

Stream s = new MemoryStream(1000);
for (int x = 0; x < 10; x++)
{
    AddRecordToStream(s, string.Format("{0}\n", (x * 10)));
}
s.Position = 0;
FileDataColumn[] cols = new[] 
{ 
    new FileDataColumn { ColumnName = "First", ColumnType = typeof(int) } 
};

IDataReader dataReader = new FileDataReader(
    s,
    cols,
    '\n',
    ';',
    Encoding.Unicode,
    record =>
    {

        int currentValue = record.GetInt32(0);
        record.SetValue(0, currentValue * 2);
    });

for (int x = 0; x < 10; x++)
{
    dataReader.Read();
    Assert.That(dataReader[0], Is.EqualTo(x * 10 * 2), x.ToString());

}

Nice and easy if you ask me Laughing - naturally you could easily extend and improve my FileDataReader implementation, but this will give you a hint on how you efficiently can read a file into SQL Server if you need to.

To use this reader together with SqlBulkCopy you simply create an instance of the FileDataReader and use it like below:

 

using (SqlBulkCopy bulkCopy =
                new SqlBulkCopy(destinationConnection))
{
    bulkCopy.DestinationTableName =
        "dbo.DestinationTable";

    try
    {
        bulkCopy.WriteToServer(reader);
    }
    catch (Exception ex)
    {
        Console.WriteLine(ex.Message);
    }
    finally
    {
        reader.Close();
    }
}

 

I have attached the entire source code project for both this post and the previous one, including integration tests that will show how to use the code.

I hope you enjoy using it, I certainly enjoyed writing the code.

Any questions, post a comment or leave feedback.

FileDataReader.zip (14.44 kb)

kick it on DotNetKicks.com

Reading stuctured files into SQL Server Part 1

From time to time we have all probably been tasked with getting a structured file into SQL Server.

It could be a comma separated file, it could be some other delimeter. It does not really matter. What matter is that there are several ways of getting that data into SQL Server, where some are fast and efficient and others slow and sometimes even impossible if you do it wrong.

The obvious way of importing a structured file into SQL server is to either use BCP or to use SQL servers built in BULK INSERT.

i.e. 

BULK
INSERT 
YourTableName
FROM 'c:\commaseparatedfilename.csv'
WITH
(
FIELDTERMINATOR ',',
ROWTERMINATOR '\n'
)
GO

 

BCP and the built in method is fine when you want to do a one time import, and if you like to stretch it a bit you could even do periodic importing using a maintenance task that every day at a certain time imports a specific file from a location into sql server.

But what if you need to do some processing of the file? Then you are pretty much stuck with writing a program that reads the file and writes the modified records into sql server.

That might be cumbersome task, and what if your file is several gigabytes in size. Then you cannot simply read in the entire file, since your program might run out of memory. So what you should do is simply if possible read one record from the file at a time and process the record and pass it onto sql server.

I have created a few classes to help with that, which I will present in this blog post and the ones to come.

The tasks you need to do to get that file into SQL Server is probably something like:

 

  1. Read the records out from file, one at a time, as efficiently as possible using as little memory as possible
  2. Parse each record into its different columns resulting in a strongly typed object that can be pased onto SQL Server easily.
  3. Optionally parse each record and its values before its being passed onto SQL Server for storing.

 

I will present a nice solution to task #1 in this first blog post, and will present a solution to #2 in the next blog post.

For task #1 I have created a nice little class that I call FileRecordReader, which basically have a single method called ReadNextRecord.

 

The method ReadNextRecord will read the next record and return that as a string and advance its internal positions to the location of the next record in the file.

 

/// <summary>
/// Reads the next record from the stream.
/// </summary>
/// <returns>The next record from the stream or null if no more records exist.</returns>
public string ReadNextRecord()
{

 

The FileRecordReader class takes a few arguments in its constructor that will help it read the file and understand where a record starts and stops.

 

/// <summary>
/// Initializes a new instance of the <see cref="FileRecordReader"/> class.
/// </summary>
/// <param name="fileStream">The file stream.</param>
/// <param name="recordSeparator">The record separator.</param>
/// <param name="fileEncoding">The file encoding.</param>
public FileRecordReader(Stream fileStream, char recordSeparator, Encoding fileEncoding)
{

 

First argument is the stream where the reader should read its data from, which in real life usages should be a FileStream instance.

Second argument is a char that will be used to separate the records from each other - normal use cases would be a newline character \n, but this class supports any arbitrary character that you would like to use, in the case your records contains linebreaks that you would like to retain in the imported data.

Last parameter is the encoding of the file. This is also very important since a UTF-8 or -16 encoded file that gets read using your standard encoding in windows will not look pretty since they will be parsed incorrectly.

If you look at the source code attached you might see that it has a similar way of working as the built in StreamReader class and the method ReadLine - but if you have a different record separator you cannot use StreamReader but have to parse the file yourself.

I have attached the source code to the FileRecordReader class and also a simple test that tests that the class is working.

FileRecordReader.cs (5.33 kb)

FileRecordReaderTests.cs (2.86 kb)

 

Stay tuned for the next post where I will describe how you can use the above FileRecordReader to present a nice interface that makes it easy to get those records into SQL Server.

kick it on DotNetKicks.com