Why do System.IO.Log SequenceNumbers have variable length?
- by Doug McClean
I'm trying to use the System.IO.Log features to build a recoverable transaction system. I understand it to be implemented on top of the Common Log File System.
The usual ARIES approach to write-ahead logging involves persisting log record sequence numbers in places other than the log (for example, in the header of the database page modified by the logged action).
Interestingly, the documentation for CLFS says that such sequence numbers are always 64-bit integers.
Confusingly, however, the .Net wrapper around those SequenceNumbers can be constructed from a byte[] but not from a UInt64. It's value can also be read as a byte[], but not as a UInt64. Inspecting the implementation of SequenceNumber.GetBytes() reveals that it can in fact return arrays of either 8 or 16 bytes.
This raises a few questions:
Why do the .Net sequence numbers differ in size from the CLFS sequence numbers?
Why are the .Net sequence numbers variable in length?
Why would you need 128 bits to represent such a sequence number? It seems like you would truncate the log well before using up a 64-bit address space (16 exbibytes, or around 10^19 bytes, more if you address longer words)?
If log sequence numbers are going to be represented as 128 bit integers, why not provide a way to serialize/deserialize them as pairs of UInt64s instead of rather-pointlessly incurring heap allocations for short-lived new byte[]s every time you need to write/read one? Alternatively, why bother making SequenceNumber a value type at all?
It seems an odd tradeoff to double the storage overhead of log sequence numbers just so you can have an untruncated log longer than a million terabytes, so I feel like I'm missing something here, or maybe several things. I'd much appreciate it if someone in the know could set me straight.