/egilh

Learning by doing

.NET CF DataSet performance using XML, text and CSV

Posted on Sunday, November 28, 2004 9:34 PM

I want a fast storage for my secrets that is easy to synchronize with a PC. The obvious choice would be a DataSet serialized to an XML file. It's fast on a PC but my SMS Manager slows down on the Pocket PC when the DB grows. The password manager will be used for reading in 99% of the case so I set up a simple test suite on the PPC to test the performance of the different file formats I was considering:

  • Text array: reads a line and does a .Split() using tab as the separator. Creates a  2 dimensional array (rows/fields in row)
  • CSV array: parses CSV file and creates a  2 dimensional array (rows/fields in row)
  • XML dataset: uses the ReadXml() method of the DataSet object
  • CSV dataset: parses a CSV file and builds a DataSet in memory


Example test routine (XML DataSet):

openFileDialog.Filter = "XML Test|*.xml";
if (DialogResult.OK == openFileDialog.ShowDialog())
{
    string fileName = openFileDialog.FileName;
    int startTick = Environment.TickCount;        
 
    System.Data.DataSet ds = new System.Data.DataSet();
    ds.ReadXml(fileName);            
    int ticks = Environment.TickCount - startTick;
    System.Windows.Forms.MessageBox.Show("Time taken: " + 
            ticks + " ms");
}

I know that DataSets serialized to XML are slow on .NET Compact Framework but I had no idea they were this slow:

The test were run with 1.000 records on a H3870.  I repeated the tests with 100, 1.000 and 10.000 records with similar results.

I find it strange that my CSV version is almost 3 times faster than the text version that does a simple Split(). This is the Text reader core:

StreamReader sr = File.OpenText(fileName);
String input;
while ((input = sr.ReadLine()) != null )
{
    rows.Add (input.Split(fieldSeparators));
}
sr.Close();

The text version is slightly faster than the CSV version the first time it is run (not shown in my graphs). I guess this is because the String class is pre-jitted.
I have decided use the CSV DataSet for several reasons:

  • It gives me all the features of DataSets I would otherwise have to implement myself for the array versions: sort, filter, search
  • It has less start up overhead for the first call (728 ms vs 2.218 ms for XML)
  • It has acceptable performance up to 10.000 records (6.303 ms vs 36.513 ms for XML)


I will play with encryption support next.




Feel free to drop a few cents in the tip jar if this post saved you time and money

Feedback

# re: .NET CF DataSet performance using XML, text and CSV

12/14/2004 1:07 PM by Justin

You should try binary serialization, it will take up less storage space on the device and is a lot faster, unless you have some reason to make the dataset file human readable. But you say you want to encrypt it, so I guess not.


# re: .NET CF DataSet performance using XML, text and CSV

12/14/2004 9:36 PM by egilh

Good point.

I was looking for a Save() function that could save binary, but I didn't consider serializing the entire object to binary. My data will be synchronized back and forth with ActiveSync between the .NET CF and .NET 1.1 on my PC and I was concerned that it would not work with a complex object like a DataSet. Have you tried it?

I have had a look on the net and found some articles on the subject of binary serialization of DataSets:
1 - Dino Esposito explains that the BinarySerializer creates a long byte stream of XML and how to do better binary serialization of DataSets [http://msdn.microsoft.com/msdnmag/issues/02/12/CuttingEdge/]
2 - Peter A. Bromberg explains how to do True Binary Serialization and Compression of DataSets [http://www.eggheadcafe.com/articles/20031219.asp] and also touches on .NET CF issues

Binary serialization using a custom format should work and it would be interesting to see the performance. I'm a bit caught up at work but I should be able to run some tests after Christmas


# .NET Compact Framework performance tips

1/5/2005 8:09 PM by /egilh




# Feature requests for .NET Compact Framework

1/13/2005 11:40 PM by /egilh




# re: .NET CF DataSet performance using XML, text and CSV

2/7/2006 4:51 PM by Jason

Hi,

I realize this thread was posted over a year ago. But I do have a question regarding your CSV dataset test.

You chose to go with the CSV dataset. You mention your times for 10,000 records for CSV vs XML. What do these times indicate? How long it took to load the 10,000 records into the dataset?

I'm very curious because I'm taking a huge performance hit when trying to load just 4,000 records via a Dataset.ReadXml. This is not acceptable and I'm searching for better ways to get this data into the app.

I would appreciate ANY info you could give me regarding your performance testing.

Thanks,

Jason



# re: .NET CF DataSet performance using XML, text and CSV

2/7/2006 8:52 PM by Egil Hogholt

The performance of XML based DataSets in .NET CF 1.x is horrible. It is faster in .NET 2.0 but not enough to make a major difference.

The times in the post were recorded on a HP iPAQ H3870 and they are in milliseconds. I used a simple recordset with 5 columns with text data items that were 6 characters long. The time taken to load 1 000 records were:
~ 12 seconds with XML dataset
~ 2 seconds with CSV based dataset
~ 1 second with Text Array
< 0.5 seconds with CSV array

I don't have the test results or 10 000 record handy but I can run them again on a faster device if you are interested. The results back then were:
~37 seconds for XML DataSet
~6 seconds for CSV based DataSet
CSV array took a second or two

I also had interoperability problems between the Pocket PC and PC for DataSets in .NET CF 1.0 so I went for my own custom CSV DataSet. I added encryption and I have used it for a long time in poSecrets http://www.egilh.com/blog/articles/poSecrets.aspx without problems.

Let me know if you are interested in the source and I will try to put together a encrypted CSV DataSet package. CSV Array is the way to go if you are interested in raw performance using a human readable file format though.


# re: .NET CF DataSet performance using XML, text and CSV

2/8/2006 4:42 AM by Jason

Thanks for the update Egil. Very much appreciated.

I dug around my code and was able to add a few tweaks. Somehow, the schema was omitted in my XML file. I re-added it and that alone shaved about 25 secs. off the load. And through some further testing, I have now determined the bottleneck is on the saving of the dataset data into the SQL CE database.

Currently, I'm using a ds.ReadXML and it's loading the 4600+ records into memory in less than 3 secs. So I'm not going to complain about that performance. But where the speed takes the big hit, is after the XML data is loaded into memory, it has to be saved into the SQL CE database with the rest of the data. After the XML load, it takes 2-2.5 minutes to load the 4600+ records into the database.

My XML file has a total of 6 fields. 1 integer and 5 x 60 character (varchar) values. I'm now using CF 1.0 SP3 running on a PPC 2003 device.

I will definitely have a look at your poSecrets code because over time this 1 XML file is going to grow and will continue to grow. So I would like to get the best performing load possible now. And the encryption idea is appealing too to keep the corporate data somewhat from prying eyes.

Thanks again for your benchmarks. Now back to further testing. :)




# re: .NET CF DataSet performance using XML, text and CSV

9/24/2007 6:23 AM by CF User

Try http://gotcf.net - BinaryFormatter for the .net compact framework compatible with the full .NET framework; includes binary serialization for the DataSets and DataTables too.


Post Comment
Title
 

Name
 

Url

Protected by Clearscreen.SharpHIPEnter the code you see:
Comment