maintaining a large list in python
Posted
by Oren
on Stack Overflow
See other posts from Stack Overflow
or by Oren
Published on 2010-03-24T17:59:34Z
Indexed on
2010/03/24
18:03 UTC
Read the original article
Hit count: 575
I need to maintain a large list of python pickleable objects that will quickly execute the following algorithm:
The list will have 4 pointers and can:
- Read\Write each of the 4 pointed nodes
- Insert new nodes before the pointed nodes
- Increase a pointer by 1
- Pop nodes from the beginning of the list (without overlapping the pointers)
- Add nodes at the end of the list
The list can be very large (300-400 MB), so it shouldnt be contained in the RAM. The nodes are small (100-200 bytes) but can contain various types of data.
The most efficient way it can be done, in my opinion, is to implement some paging-mechanism. On the harddrive there will be some database of a linked-list of pages where in every moment up to 5 of them are loaded in the memory. On insertion, if some max-page-size was reached, the page will be splitted to 2 small pages, and on pointer increment, if a pointer is increased beyond its loaded page, the following page will be loaded instead.
This is a good solution, but I don't have the time to write it, especially if I want to make it generic and implement all the python-list features.
Using a SQL tables is not good either, since I'll need to implement a complex index-key mechanism.
I tried ZODB and zc.blist, which implement a BTree based list that can be stored in a ZODB database file, but I don't know how to configure it so the above features will run in reasonable time. I don't need all the multi-threading\transactioning features. No one else will touch the database-file except for my single-thread program.
Can anyone explain me how to configure the ZODB\zc.blist so the above features will run fast, or show me a different large-list implementation?
© Stack Overflow or respective owner