Transformer
- Sequence to Sequence (Seq2Seq)
- Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM)
- Autoencoders / Encoder-Decoders
- Natural Language Inference (NLI) and Recognizing Textual Entailment (RTE)
“Attend” to specific parts of an image in sequence, one after another. By relying on a sequence of glances, they capture visual structure, can be contrasted with other machine vision techniques that process a whole image in a single, forward pass.