I'm soon to be working on a project that poses a problem for me.
It's going to require, at regular intervals throughout the day, processing tens of thousands of records, potentially over a million. Processing is going to involve several (potentially complicated) formulas and the generation of several random factors, writing some new data to a separate table, and updating the original records with some results. This needs to occur for all records, ideally, every three hours. Each new user to the site will be adding between 50 and 500 records that need to be processed in such a fashion, so the number will not be steady.
The code hasn't been written, yet, as I'm still in the design process, mostly because of this issue. I know I'm going to need to use cron jobs, but I'm concerned that processing records of this size may cause the site to freeze up, perform slowly, or just piss off my hosting company every three hours.
I'd like to know if anyone has any experience or tips on similar subjects? I've never worked at this magnitude before, and for all I know, this will be trivial to the server and not pose much of an issue. As long as ALL records are processed before the next three hour period occurs, I don't care if they aren't processed simultaneously (though, ideally, all records belonging to a specific user should be processed in the same batch), so I've been wondering if I should process in batches every 5 minutes, 15 minutes, hour, whatever works, and how best to approach this (and make it scalable in a way that is fair to all users)?