Average-Stochastic Gradient Descent (SGD) Weight-Dropped LSTM (AWD-LSTM)

The AWD-LSTM has been dominating the state-of-the-art language modeling. All the top research papers on word-level models incorporate AWD-LSTMs. And it has shown great results on character-level models as well. ..The AWD-LSTM stands for ASGD Weight-Dropped LSTM. It uses DropConnect and a variant of Average-SGD (NT-ASGD) along with several other well-known regularization strategies. What makes the AWD-LSTM great? | Yashu Seth