Optimizing tasks to reduce CPU in a trading application
- by Joel
Hello,
I have designed a trading application that handles customers stocks investment portfolio.
I am using two datastore kinds:
Stocks - Contains unique stock name and its daily percent change.
UserTransactions - Contains information regarding a specific purchase of a stock made by a user : the value of the purchase along with a reference to Stock for the current purchase.
db.Model python modules:
class Stocks (db.Model):
stockname = db.StringProperty(multiline=True)
dailyPercentChange=db.FloatProperty(default=1.0)
class UserTransactions (db.Model):
buyer = db.UserProperty()
value=db.FloatProperty()
stockref = db.ReferenceProperty(Stocks)
Once an hour I need to update the database: update the daily percent change in Stocks and then update the value of all entities in UserTransactions that refer to that stock.
The following python module iterates over all the stocks, update the dailyPercentChange property, and invoke a task to go over all UserTransactions entities which refer to the stock and update their value:
Stocks.py
# Iterate over all stocks in datastore
for stock in Stocks.all():
# update daily percent change in datastore
db.run_in_transaction(updateStockTxn, stock.key())
# create a task to update all user transactions entities referring to this stock
taskqueue.add(url='/task', params={'stock_key': str(stock.key(), 'value' : self.request.get ('some_val_for_stock') })
def updateStockTxn(stock_key):
#fetch the stock again - necessary to avoid concurrency updates
stock = db.get(stock_key)
stock.dailyPercentChange= data.get('some_val_for_stock') # I get this value from outside
... some more calculations here ...
stock.put()
Task.py (/task)
# Amount of transaction per task
amountPerCall=10
stock=db.get(self.request.get("stock_key"))
# Get all user transactions which point to current stock
user_transaction_query=stock.usertransactions_set
cursor=self.request.get("cursor")
if cursor:
user_transaction_query.with_cursor(cursor)
# Spawn another task if more than 10 transactions are in datastore
transactions = user_transaction_query.fetch(amountPerCall)
if len(transactions)==amountPerCall:
taskqueue.add(url='/task', params={'stock_key': str(stock.key(), 'value' : self.request.get ('some_val_for_stock'), 'cursor': user_transaction_query.cursor() })
# Iterate over all transaction pointing to stock and update their value
for transaction in transactions:
db.run_in_transaction(updateUserTransactionTxn, transaction.key())
def updateUserTransactionTxn(transaction_key):
#fetch the transaction again - necessary to avoid concurrency updates
transaction = db.get(transaction_key)
transaction.value= transaction.value* self.request.get ('some_val_for_stock')
db.put(transaction)
The problem:
Currently the system works great, but the problem is that it is not scaling well… I have around 100 Stocks with 300 User Transactions, and I run the update every hour. In the dashboard, I see that the task.py takes around 65% of the CPU (Stock.py takes around 20%-30%) and I am using almost all of the 6.5 free CPU hours given to me by app engine. I have no problem to enable billing and pay for additional CPU, but the problem is the scaling of the system… Using 6.5 CPU hours for 100 stocks is very poor.
I was wondering, given the requirements of the system as mentioned above, if there is a better and more efficient implementation (or just a small change that can help with the current implemntation) than the one presented here.
Thanks!!
Joel