parse content away from structure in a binary file

Posted by Jeff Godfrey on Stack Overflow See other posts from Stack Overflow or by Jeff Godfrey
Published on 2010-05-26T18:36:15Z Indexed on 2010/05/26 21:41 UTC
Read the original article Hit count: 307

Filed under:
|
|
|
|

Using C#, I need to read a packed binary file created using FORTRAN. The file is stored in an "Unformatted Sequential" format as described here (about half-way down the page in the "Unformatted Sequential Files" section):

http://www.tacc.utexas.edu/services/userguides/intel8/fc/f_ug1/pggfmsp.htm

As you can see from the URL, the file is organized into "chunks" of 130 bytes or less and includes 2 length bytes (inserted by the FORTRAN compiler) surrounding each chunk.

So, I need to find an efficient way to parse the actual file payload away from the compiler-inserted formatting.

Once I've extracted the actual payload from the file, I'll then need to parse it up into its varying data types. That'll be the next exercise.

My first thoughts are to slurp up the entire file into a byte array using File.ReadAllBytes. Then, just iterate through the bytes, skipping the formatting and transferring the actual data to a second byte array.

In the end, that second byte array should contain the actual file contents minus all the formatting, which I'd then need to go back through to get what I need.

As I'm fairly new to C#, I thought there might be a better, more accepted way of tackling this.

Also, in case it's helpful, these files could be fairly large (say 30MB), though most will be much smaller...

© Stack Overflow or respective owner

Related posts about c#

Related posts about .NET