Is it conceivable to have millions of lists of data in memory in Python?
- by Codemonkey
I have over the last 30 days been developing a Python application that utilizes a MySQL database of information (specifically about Norwegian addresses) to perform address validation and correction. The database contains approximately 2.1 million rows (43 columns) of data and occupies 640MB of disk space.
I'm thinking about speed optimizations, and I've got to assume that when validating 10,000+ addresses, each validation running up to 20 queries to the database, networking is a speed bottleneck.
I haven't done any measuring or timing yet, and I'm sure there are simpler ways of speed optimizing the application at the moment, but I just want to get the experts' opinions on how realistic it is to load this amount of data into a row-of-rows structure in Python. Also, would it even be any faster? Surely MySQL is optimized for looking up records among vast amounts of data, so how much help would it even be to remove the networking step? Can you imagine any other viable methods of removing the networking step?
The location of the MySQL server will vary, as the application might well be run from a laptop at home or at the office, where the server would be local.