Efficient data importing?
- by Kevin
We work with a lot of real estate, and while rearchitecting how the data is imported, I came across an interesting issue.
Firstly, the way our system works (loosely speaking) is we run a Coldfusion process once a day that retrieves data provided from an IDX vendor via FTP. They push the data to us. Whatever they send us is what we get.
Over the years, this has proven to be rather unstable.
I am rearchitecting it with PHP on the RETS standard, which uses SOAP methods of retrieving data, which is already proven to be much better than what we had.
When it comes to 'updating' existing data, my initial thought was to query only for data that was updated. There is a field for 'Modified' that tells you when a listing was last updated, and the code I have will grab any listing updated within the last 6 hours (give myself a window in case something goes wrong).
However, I see a lot of real estate developers suggest creating 'batch' processes that run through all listings regardless of updated status that is constantly running.
Is this the better way to do it? Or am I fine with just grabbing the data I know I need? It doesn't make a lot of sense to me to do more processing than necessary. Thoughts?