Difference between revisions of "SMART - Multi-Task Deep Neural Networks (MT-DNN)"
Line 10: | Line 10: | ||
* [[Natural Language Processing (NLP)]] | * [[Natural Language Processing (NLP)]] | ||
* [[Bidirectional Encoder Representations from Transformers (BERT)]] | * [[Bidirectional Encoder Representations from Transformers (BERT)]] | ||
+ | * [[Deep Distributed Q Network Partial Observability]] | ||
* [http://arxiv.org/pdf/1911.03437.pdf SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization | H. Jaing, P. He, W. Chen, X. Liu, J. G, and T. Zhao] | * [http://arxiv.org/pdf/1911.03437.pdf SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization | H. Jaing, P. He, W. Chen, X. Liu, J. G, and T. Zhao] | ||
* [http://www.profillic.com/search?query=Pengcheng%20Gao A Hybrid Neural Network Model for Commonsense Reasoning | P. He, X. Liu, W. Chen and J. Gao - Profillic] | * [http://www.profillic.com/search?query=Pengcheng%20Gao A Hybrid Neural Network Model for Commonsense Reasoning | P. He, X. Liu, W. Chen and J. Gao - Profillic] |
Revision as of 07:53, 6 July 2020
Youtube search... | ...Google search
- Natural Language Processing (NLP)
- Bidirectional Encoder Representations from Transformers (BERT)
- Deep Distributed Q Network Partial Observability
- SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization | H. Jaing, P. He, W. Chen, X. Liu, J. G, and T. Zhao
- A Hybrid Neural Network Model for Commonsense Reasoning | P. He, X. Liu, W. Chen and J. Gao - Profillic
- Multi-Task Deep Neural Networks for Natural Language Understanding - GitHub
With our recently developed SMART technology, we jointly trained the tasks with Multi-Task Deep Neural Networks (MT-DNN) and hybrid neural network (HNN) models. Those models are initialized with RoBERTa large model. Parameter Description: All the tasks share the same model structure, while parameters are not shared across tasks. a new computational framework for robust and efficient finetuning pre-trained language models through regularized optimization techniques. Specifically, our framework consists of two important ingredients:
- Smoothness-inducing Adversarial Regularization, which can effectively manage the capacity of the pre-trained model.
- Bregman Proximal Point Optimization, which is a class of trust-region optimization methods and can prevent knowledge forgetting.