Optimising RSS parsing on App Engine to avoid high CPU warnings

Posted by Danny Tuppeny on Stack Overflow See other posts from Stack Overflow or by Danny Tuppeny
Published on 2010-04-01T20:48:16Z Indexed on 2010/04/01 20:53 UTC
Read the original article Hit count: 409

Filed under:
|

I'm pulling some RSS feeds into a datastore in App Engine to serve up to an iPhone app. I use cron to schedule updating the RSS every x minutes. Each task only parses one RSS feed (which has 15-20 items). I frequently get warnings about high CPU usage in the App Engine dashboard, so I'm looking for ways to optimise my code.

Currently, I use minidom (since it's already there on App Engine), but I suspect it's not very efficient!

Here's the code:

 dom = minidom.parseString(urlfetch.fetch(url).content)
    if dom:
        items = []
        for node in dom.getElementsByTagName('item'):
            item = RssItem(
                key_name = self.getText(node.getElementsByTagName('guid')[0].childNodes),
                title = self.getText(node.getElementsByTagName('title')[0].childNodes),
                description = self.getText(node.getElementsByTagName('description')[0].childNodes),
                modified = datetime.now(),
                link = self.getText(node.getElementsByTagName('link')[0].childNodes),
                categories = [self.getText(category.childNodes) for category in node.getElementsByTagName('category')]
            );
            items.append(item);
        db.put(items);

def getText(self, nodelist):
    rc = ''
    for node in nodelist:
        if node.nodeType == node.TEXT_NODE:
            rc = rc + node.data
    return rc

There isn't much going on, but the scripts often take 2-6 seconds CPU time, which seems a bit excessive for looping through 20ish items and reading a few attributes.

What can I do to make this faster? Is there anything particularly bad in the above code, or should I change to another way of parsing? Are there are any libraries (that work on App Engine) that would be better, or would I be better parsing the RSS myself?

© Stack Overflow or respective owner

Related posts about google-app-engine

Related posts about python