How to best design a date/geographic proximity query on GAE?
Posted
by Dane
on Stack Overflow
See other posts from Stack Overflow
or by Dane
Published on 2010-03-26T18:28:42Z
Indexed on
2010/03/26
18:33 UTC
Read the original article
Hit count: 596
Hi all,
I'm building a directory for finding athletic tournaments on GAE with web2py and a Flex front end. The user selects a location, a radius, and a maximum date from a set of choices. I have a basic version of this query implemented, but it's inefficient and slow. One way I know I can improve it is by condensing the many individual queries I'm using to assemble the objects into bulk queries. I just learned that was possible. But I'm also thinking about a more extensive redesign that utilizes memcache.
The main problem is that I can't query the datastore by location because GAE won't allow multiple numerical comparison statements (<,<=,>=,>) in one query. I'm already using one for date, and I'd need TWO to check both latitude and longitude, so it's a no go. Currently, my algorithm looks like this:
1.) Query by date and select
2.) Use destination function from geopy's distance module to find the max and min latitude and longitudes for supplied distance
3.) Loop through results and remove all with lat/lng outside max/min
4.) Loop through again and use distance function to check exact distance, because step 2 will include some areas outside the radius. Remove results outside supplied distance (is this 2/3/4 combination inefficent?)
5.) Assemble many-to-many lists and attach to objects (this is where I need to switch to bulk operations)
6.) Return to client
Here's my plan for using memcache.. let me know if I'm way out in left field on this as I have no prior experience with memcache or server caching in general.
-Keep a list in the cache filled with "geo objects" that represent all my data. These have five properties: latitude, longitude, event_id, event_type (in anticipation of expanding beyond tournaments), and start_date. This list will be sorted by date.
-Also keep a dict of pointers in the cache which represent the start and end indices in the cache for all the date ranges my app uses (next week, 2 weeks, month, 3 months, 6 months, year, 2 years).
-Have a scheduled task that updates the pointers daily at 12am.
-Add new inserts to the cache as well as the datastore; update pointers.
Using this design, the algorithm would now look like:
1.) Use pointers to slice off appropriate chunk of list based on supplied date.
2-4.) Same as above algorithm, except with geo objects
5.) Use bulk operation to select full tournaments using remaining geo objects' event_ids
6.) Assemble many-to-manys
7.) Return to client
Thoughts on this approach? Many thanks for reading and any advice you can give.
-Dane
© Stack Overflow or respective owner