Caching sitemaps in django

Posted by michuk on Stack Overflow See other posts from Stack Overflow or by michuk
Published on 2010-01-17T02:46:37Z Indexed on 2010/03/26 23:53 UTC
Read the original article Hit count: 355

Filed under:
|
|

I implemented a simple sitemap class using django's default sitemap app. As it was taking a long time to execute, I added manual caching:

class ShortReviewsSitemap(Sitemap):
    changefreq = "hourly"
    priority = 0.7

    def items(self):
        # try to retrieve from cache
        result = get_cache(CACHE_SITEMAP_SHORT_REVIEWS, "sitemap_short_reviews")
        if result!=None:
            return result

        result = ShortReview.objects.all().order_by("-created_at")

        # store in cache
        set_cache(CACHE_SITEMAP_SHORT_REVIEWS, "sitemap_short_reviews", result)

        return result

    def lastmod(self, obj):
        return obj.updated_at

The problem is that memcache allows only max 1MB object. This one was bigger that 1MB, so storing into cache failed:

>7 SERVER_ERROR object too large for cache

The problem is that django has an automated way of deciding when it should divide the sitemap file into smalled ones. According to the docs (http://docs.djangoproject.com/en/dev/ref/contrib/sitemaps/):

You should create an index file if one of your sitemaps has more than 50,000 URLs. In this case, Django will automatically paginate the sitemap, and the index will reflect that.

What do you think would be the best way to enable caching sitemaps? - Hacking into django sitemaps framework to restrict a single sitemap size to, let's say, 10,000 records seems like the best idea. Why was 50,000 chosen in the first place? Google advice? random number? - Or maybe there is a way to allow memcached store bigger files? - Or perhaps onces saved, the sitemaps should be made available as static files? This would mean that instead of caching with memcached I'd have to manually store the results in the filesystem and retrieve them from there next time when the sitemap is requested (perhaps cleaning the directory daily in a cron job).

All those seem very low level and I'm wondering if an obvious solution exists...

© Stack Overflow or respective owner

Related posts about django

Related posts about caching