maintaining a large list in python
- by Oren
I need to maintain a large list of python pickleable objects that will quickly execute the following algorithm:
The list will have 4 pointers and can:
Read\Write each of the 4 pointed nodes
Insert new nodes before the pointed nodes
Increase a pointer by 1
Pop nodes from the beginning of the list (without overlapping the pointers)
Add nodes at the end of the list
The list can be very large (300-400 MB), so it shouldnt be contained in the RAM.
The nodes are small (100-200 bytes) but can contain various types of data.
The most efficient way it can be done, in my opinion, is to implement some paging-mechanism. On the harddrive there will be some database of a linked-list of pages where in every moment up to 5 of them are loaded in the memory. On insertion, if some max-page-size was reached, the page will be splitted to 2 small pages, and on pointer increment, if a pointer is increased beyond its loaded page, the following page will be loaded instead.
This is a good solution, but I don't have the time to write it, especially if I want to make it generic and implement all the python-list features.
Using a SQL tables is not good either, since I'll need to implement a complex index-key mechanism.
I tried ZODB and zc.blist, which implement a BTree based list that can be stored in a ZODB database file, but I don't know how to configure it so the above features will run in reasonable time.
I don't need all the multi-threading\transactioning features. No one else will touch the database-file except for my single-thread program.
Can anyone explain me how to configure the ZODB\zc.blist so the above features will run fast, or show me a different large-list implementation?