Loading datasets from datastore and merge into single dictionary. Resource problem.

Posted by fredrik on Stack Overflow See other posts from Stack Overflow or by fredrik
Published on 2010-05-10T22:00:23Z Indexed on 2010/05/10 22:04 UTC
Read the original article Hit count: 235

Filed under:
|

Hi,

I have a productdatabase that contains products, parts and labels for each part based on langcodes.

The problem I'm having and haven't got around is a huge amount of resource used to get the different datasets and merging them into a dict to suit my needs.

The products in the database are based on a number of parts that is of a certain type (ie. color, size). And each part has a label for each language. I created 4 different models for this. Products, ProductParts, ProductPartTypes and ProductPartLabels.

I've narrowed it down to about 10 lines of code that seams to generate the problem. As of currently I have 3 Products, 3 Types, 3 parts for each type, and 2 languages. And the request takes a wooping 5500ms to generate.

for product in productData:
        productDict = {}
        typeDict = {}
        productDict['productName'] = product.name

        cache_key = 'productparts_%s' % (slugify(product.key()))
        partData = memcache.get(cache_key)

        if not partData:
            for type in typeData:
                typeDict[type.typeId] = { 'default' : '', 'optional' : [] }
            ## Start of problem lines ##
            for defaultPart in product.defaultPartsData:
                for label in labelsForLangCode:
                    if label.key() in defaultPart.partLabelList:
                        typeDict[defaultPart.type.typeId]['default'] = label.partLangLabel

            for optionalPart in product.optionalPartsData:
                for label in labelsForLangCode:
                    if label.key() in optionalPart.partLabelList:
                        typeDict[optionalPart.type.typeId]['optional'].append(label.partLangLabel)
            ## end problem lines ##
            memcache.add(cache_key, typeDict, 500)
            partData = memcache.get(cache_key)

        productDict['parts'] = partData    
        productList.append(productDict)

I guess the problem lies in the number of for loops is too many and have to iterate over the same data over and over again. labelForLangCode get all labels from ProductPartLabels that match the current langCode.

All parts for a product is stored in a db.ListProperty(db.key). The same goes for all labels for a part.

The reason I need the some what complex dict is that I want to display all data for a product with it's default parts and show a selector for the optional one.

The defaultPartsData and optionaPartsData are properties in the Product Model that looks like this:

@property
def defaultPartsData(self):
    return ProductParts.gql('WHERE __key__ IN :key', key = self.defaultParts)

@property
def optionalPartsData(self):
    return ProductParts.gql('WHERE __key__ IN :key', key = self.optionalParts)

When the completed dict is in the memcache it works smoothly, but isn't the memcache reset if the application goes in to hibernation? Also I would like to show the page for first time user(memcache empty) with out the enormous delay.

Also as I said above, this is only a small amount of parts/product. What will the result be when it's 30 products with 100 parts.

Is one solution to create a scheduled task to cache it in the memcache every hour? It this efficient?

I know this is alot to take in, but I'm stuck. I've been at this for about 12 hours straight. And can't figure out a solution.

..fredrik

© Stack Overflow or respective owner

Related posts about google-app-engine

Related posts about python