How to optimize a postgreSQL server for a "write once, read many"-type infrastructure ?
- by mhu
Greetings,
I am working on a piece of software that logs entries (and related tagging) in a PostgreSQL database for storage and retrieval. We never update any data once it has been inserted; we might remove it when the entry gets too old, but this is done at most once a day. Stored entries can be retrieved by users.
The insertion of new entries can happen rather fast and regularly, thus the database will commonly hold several millions elements.
The tables used are pretty simple : one table for ids, raw content and insertion date; and one table storing tags and their values associated to an id. User search mostly concern tags values, so SELECTs usually consist of JOIN queries on ids on the two tables.
To sum it up :
2 tables
Lots of INSERT
no UPDATE
some DELETE, once a day at most
some user-generated SELECT with JOIN
huge data set
What would an optimal server configuration (software and hardware, I assume for example that RAID10 could help) be for my PostgreSQL server, given these requirements ? By optimal, I mean one that allows SELECT queries taking a reasonably little amount of time.
I can provide more information about the current setup (like tables, indexes ...) if needed.