Designing a database file format

Posted by RoliSoft on Stack Overflow See other posts from Stack Overflow or by RoliSoft
Published on 2012-12-18T22:53:50Z Indexed on 2012/12/18 23:03 UTC
Read the original article Hit count: 300

Filed under:
|
|
|
|

I would like to design my own database engine for educational purposes, for the time being. Designing a binary file format is not hard nor the question, I've done it in the past, but while designing a database file format, I have come across a very important question:

How to handle the deletion of an item?

So far, I've thought of the following two options:

  • Each item will have a "deleted" bit which is set to 1 upon deletion.
    • Pro: relatively fast.
    • Con: potentially sensitive data will remain in the file.
  • 0x00 out the whole item upon deletion.
    • Pro: potentially sensitive data will be removed from the file.
    • Con: relatively slow.
  • Recreating the whole database.
    • Pro: no empty blocks which makes the follow-up question void.
    • Con: it's a really good idea to overwrite the whole 4 GB database file because a user corrected a typo. I will sell this method to Twitter ASAP!

Now let's say you already have a few empty blocks in your database (deleted items). The follow-up question is how to handle the insertion of a new item?

  • Append the item to the end of the file.
    • Pro: fastest possible.
    • Con: file will get huge because of all the empty blocks that remain because deleted items aren't actually deleted.
  • Search for an empty block exactly the size of the one you're inserting.
    • Pro: may get rid of some blocks.
    • Con: you may end up scanning the whole file at each insert only to find out it's very unlikely to come across a perfectly fitting empty block.
  • Find the first empty block which is equal or larger than the item you're inserting.
    • Pro: you probably won't end up scanning the whole file, as you will find an empty block somewhere mid-way; this will keep the file size relatively low.
    • Con: there will still be lots of leftover 0x00 bytes at the end of items which were inserted into bigger empty blocks than they are.

Rigth now, I think the first deletion method and the last insertion method are probably the "best" mix, but they would still have their own small issues. Alternatively, the first insertion method and scheduled full database recreation. (Probably not a good idea when working with really large databases. Also, each small update in that method will clone the whole item to the end of the file, thus accelerating file growth at a potentially insane rate.)

Unless there is a way of deleting/inserting blocks from/to the middle of the file in a file-system approved way, what's the best way to do this? More importantly, how do databases currently used in production usually handle this?

© Stack Overflow or respective owner

Related posts about c#

Related posts about database