T5

From
Jump to: navigation, search

Youtube search... | ...Google search

The T5 (Text-To-Text Transfer Transformer) model. The same model is used for a wide variety of tasks by treating all tasks uniformly as taking some input text and outputting some text where the task type is embedded as descriptors in the input(see bold text in the input on the left above). This approach enables a single model to perform a wide variety of supervised tasks such as translation, classification, Q&A, summarization and even regression (e.g. outputting a similarity score between two sentences in the range 1–5. This in reality quite similar to a 21 class classification problem as explained below). The model is first pretrained unsupervised (masked objective like BERT) on a large corpus before supervised training with input text representing all these tasks and the associated labeled data which is also text (where specific tokens in the input stream “translate English to French” or “stsb sentence 1:… sentence2”, “question”/”context” etc. encode the task type as shown in figure above and the model is trained to output text matching the labeled data). With this approach of specifying input and output for supervised learning, the model shares its loss function, decoder etc. across all the disparate tasks. T5 — a model that explores the limits of transfer learning | Ajit Rajasekharan Towards Data Science - Medium

The T5 model treats a wide variety of many-to-many and many-to-one NLP tasks in a unified manner by encoding the different tasks as text directives in the input stream. This enables a single model to be trained supervised on a wide variety of NLP tasks such as translation, classification, Q&A, summarization and even regression (though in reality it is similar to a classification).