Combining FileStream and MemoryStream to avoid disk accesses/paging while receiving gigabytes of data?
Posted
by
w128
on Stack Overflow
See other posts from Stack Overflow
or by w128
Published on 2013-10-30T15:38:22Z
Indexed on
2013/10/30
15:54 UTC
Read the original article
Hit count: 220
I'm receiving a file as a stream of byte[] data packets (total size isn't known in advance) that I need to store somewhere before processing it immediately after it's been received (I can't do the processing on the fly). Total received file size can vary from as small as 10 KB to over 4 GB.
- One option for storing the received data is to use a
MemoryStream
, i.e. a sequence ofMemoryStream.Write(bufferReceived, 0, count)
calls to store the received packets. This is very simple, but obviously will result in out of memory exception for large files. - An alternative option is to use a
FileStream
, i.e.FileStream.Write(bufferReceived, 0, count)
. This way, no out of memory exceptions will occur, but what I'm unsure about is bad performance due to disk writes (which I don't want to occur as long as plenty of memory is still available) - I'd like to avoid disk access as much as possible, but I don't know of a way to control this.
I did some testing and most of the time, there seems to be little performance difference between say 10 000 consecutive calls of MemoryStream.Write()
vs FileStream.Write()
, but a lot seems to depend on buffer size and the total amount of data in question (i.e the number of writes). Obviously, MemoryStream
size reallocation is also a factor.
Does it make sense to use a combination of
MemoryStream
andFileStream
, i.e. write to memory stream by default, but once the total amount of data received is over e.g. 500 MB, write it toFileStream
; then, read in chunks from both streams for processing the received data (first process 500 MB from theMemoryStream
, dispose it, then read fromFileStream
)?Another solution is to use a custom memory stream implementation that doesn't require continuous address space for internal array allocation (i.e. a linked list of memory streams); this way, at least on 64-bit environments, out of memory exceptions should no longer be an issue. Con: extra work, more room for mistakes.
So how do FileStream
vs MemoryStream
read/writes behave in terms of disk access and memory caching, i.e. data size/performance balance. I would expect that as long as enough RAM is available, FileStream
would internally read/write from memory (cache) anyway, and virtual memory would take care of the rest. But I don't know how often FileStream
will explicitly access a disk when being written to.
Any help would be appreciated.
© Stack Overflow or respective owner