Gradient boosting predictions in low-latency production environments?

Posted by lockedoff on Stack Overflow See other posts from Stack Overflow or by lockedoff
Published on 2012-07-02T14:33:16Z Indexed on 2012/10/30 11:02 UTC
Read the original article Hit count: 138

Can anyone recommend a strategy for making predictions using a gradient boosting model in the <10-15ms range (the faster the better)?

I have been using R's gbm package, but the first prediction takes ~50ms (subsequent vectorized predictions average to 1ms, so there appears to be overhead, perhaps in the call to the C++ library). As a guideline, there will be ~10-50 inputs and ~50-500 trees. The task is classification and I need access to predicted probabilities.

I know there are a lot of libraries out there, but I've had little luck finding information even on rough prediction times for them. The training will happen offline, so only predictions need to be fast -- also, predictions may come from a piece of code / library that is completely separate from whatever does the training (as long as there is a common format for representing the trees).

© Stack Overflow or respective owner

Related posts about machine-learning

Related posts about classification