Feature Normalization and Initialization

In this guide, we will walk through feature normalization and weight initialization schemes in PyTorch. In short, we normalize our inputs for gradient descent because large weights will dominate our updates in our attempt to find global or local minima. Separately, we use custom weight initialization schemes to improve our ability to converge during optimization or to improve our ability to use certain activation functions. In this post, we walk through the application of batch normalization and Xaviar initialization in PyTorch.