How to I serialize a large graph of .NET object into a SQL Server BLOB without creating a large bu
- by Ian Ringrose
We have code like:
ms = New IO.MemoryStream
bin = New System.Runtime.Serialization.Formatters.Binary.BinaryFormatter
bin.Serialize(ms, largeGraphOfObjects)
dataToSaveToDatabase = ms.ToArray()
// put dataToSaveToDatabase in a Sql server BLOB
But the memory steam allocates a large buffer from the large memory heap that is giving us problems. So how can we stream the data without needing enough free memory to hold the serialized objects.
I am looking for a way to get a Stream from SQL server that can then be passed to bin.Serialize() so avoiding keeping all the data in my processes memory.
Likewise for reading the data back...
Some more background.
This is part of a complex numerical processing system that processes data in near real time looking for equipment problems etc, the serialization is done to allow a restart when there is a problem with data quality from a data feed etc. (We store the data feeds and can rerun them after the operator has edited out bad values.)
Therefore we serialize the object a lot more often then we de-serialize them.
The objects we are serializing include very large arrays mostly of doubles as well as a lot of small “more normal” objects. We are pushing the memory limit on a 32 bit system and make the garage collector work very hard. (Effects are being made elsewhere in the system to improve this, e.g. reusing large arrays rather then create new arrays.)
Often the serialization of the state is the last straw that courses an out of memory exception; our peak memory usage is while this serialization is being done.
I think we get large memory pool fragmentation when we de-serialize the object, I expect there are also other problem with large memory pool fragmentation given the size of the arrays. (This has not yet been investigated, as the person that first looked at this is a numerical processing expert, not a memory management expert.)
Are customers use a mix of Sql Server 2000, 2005 and 2008 and we would rather not have different code paths for each version of Sql Server if possible.
We can have many active models at a time (in different process, across many machines), each model can have many saved states. Hence the saved state is stored in a database blob rather then a file.
As the spread of saving the state is important, I would rather not serialize the object to a file, and then put the file in a BLOB one block at a time.
Other related questions I have asked
How to Stream data from/to SQL Server BLOB fields?
Is there a SqlFileStream like class that works with Sql Server 2005?