Understanding the difficulty of training deep feedforward neural networks by Glorot and Bengio, 2010 Exact solutions to the nonlinear dynamics of learning in deep linear neural networks by Saxe et al, 2013 Random walk initialization for training very deep feedforward networks by Sussillo and Abbott, 2014 Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification by He et al., 2015 Data-dependent Initializations of Convolutional Neural Networks by Krähenbühl et al., 2015 All you need is a good init, Mishkin and Matas, 2015 Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, Sergey Ioffe, Christian Szegedy what is hidden in hidden markov models? what is markov in hmms? logistic regression l2 regularization on identical weights PCA (what do the lambda values indicate)