scikit learn extratreeclassifier hanging

Posted by denson on Stack Overflow See other posts from Stack Overflow or by denson
Published on 2014-05-27T09:22:32Z Indexed on 2014/05/27 9:24 UTC
Read the original article Hit count: 154

Filed under:

I'm running the scikit learn on some rather large training datasets ~1,600,000,000 rows with ~500 features. The platform is Ubuntu server 14.04, the hardware has 100gb of ram and 20 CPU cores.

The test datasets are about half as many rows.

I set n_jobs = 10, and am forest_size = 3*number_of_features so about 1700 trees.

If I reduce the number of features to about 350 it works fine but never completes the training phase with the full feature set of 500+. The process is still executing and using up about 20gb of ram but is using 0% of CPU. I have also successfully completed on datasets with ~400,000 rows but twice as many features which completes after only about 1 hour.

I am being careful to delete any arrays/objects that are not in use.

Does anyone have any ideas I might try?

© Stack Overflow or respective owner

Related posts about scikit-learn