There's various activation functions: sigmoid, tanh, etc. And there's also a few initializer functions: Nguyen and Widrow, random, normalized, constant, zero, etc. So do these have much effect on the outcome of a neural network specialising in face detection? Right now I'm using the Tanh activation function and just randomising all the weights from -0.5 to 0.5. I have no idea if this is the best approach though, and with 4 hours to train the network each time, I'd rather ask on here than experiment!