Difference between revisions of "Gato"
m |
m |
||
| Line 13: | Line 13: | ||
* [http://www.louisbouchard.ai/deepmind-gato/ Deepmind's new model Gato is amazing!] | * [http://www.louisbouchard.ai/deepmind-gato/ Deepmind's new model Gato is amazing!] | ||
| − | DeepMind's “generalist” AI model | + | DeepMind's “generalist” AI model inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens. |
{|<!-- T --> | {|<!-- T --> | ||
| Line 19: | Line 19: | ||
{| class="wikitable" style="width: 550px;" | {| class="wikitable" style="width: 550px;" | ||
|| | || | ||
| − | <youtube> | + | <youtube>zO49vZ31xb0</youtube> |
| − | <b>Gato | + | <b>This AI Can Solve 604 Tasks [Paper Analysis of Gato by DeepMind] |
| − | </b><br> | + | </b><br>DeepMind published a revolutionary paper 🔥 They introduced Gato, a generalist AI agent that can carry out more than 600 tasks with a single transformer neural architecture. The tasks are varied, from playing Atari games to providing captions to images. |
| + | |||
| + | This paper demonstrates that: | ||
| + | |||
| + | 📌 Generalist agents can perform reasonably well on many tasks / embodiments / modalities | ||
| + | 📌 Generalist agents have the potential to learn new tasks with few data points | ||
| + | 📌 By scaling up the parameter size, we can build a general-purpose agent | ||
|} | |} | ||
|<!-- M --> | |<!-- M --> | ||
| Line 27: | Line 33: | ||
{| class="wikitable" style="width: 550px;" | {| class="wikitable" style="width: 550px;" | ||
|| | || | ||
| − | <youtube> | + | <youtube>wSQJZHfAg18</youtube> |
| − | <b> | + | <b>Is Gato Really the Future of AI? |
| − | </b><br> | + | </b><br>DeepMind has released "A Generalist Agent", a paper that introduces their new multi-modal model Gato. But is Gato truly a generalist agent? It is a transformer based model with the goal of generalizing over new tasks. It is trained fully autoregressively with supervised learning (no reinforcement learning) on a total of 603 different tasks. The tasks include robotics, Atari, DM Lab, Procgen, and a lot more. It also includes text and image tasks. This video is a paper review / explanation where I also give my thoughts on the paper. |
|} | |} | ||
|}<!-- B --> | |}<!-- B --> | ||
| Line 37: | Line 43: | ||
{| class="wikitable" style="width: 550px;" | {| class="wikitable" style="width: 550px;" | ||
|| | || | ||
| − | <youtube> | + | <youtube>6fWEHrXN9zo</youtube> |
| − | <b>AI | + | <b>Integrated AI - Gato by DeepMind (May/2022) 1.2B + Asimo, GPT-3, Tesla Optimus, Boston Dynamics |
| − | </b><br> | + | </b><br>Dr Alan D. Thompson is a world expert in artificial intelligence (AI), specializing in the augmentation of human intelligence, and advancing the evolution of ‘integrated AI’. Alan’s applied AI research and visualizations are featured across major international media, including citations in the University of Oxford’s debate on AI Ethics in December 2021. http://lifearchitect.ai/ |
|} | |} | ||
|<!-- M --> | |<!-- M --> | ||
| Line 45: | Line 51: | ||
{| class="wikitable" style="width: 550px;" | {| class="wikitable" style="width: 550px;" | ||
|| | || | ||
| − | <youtube> | + | <youtube>xZKSWNv6Esc</youtube> |
| − | <b> | + | <b>Gato: A single Transformer to RuLe them all! ([[Google]]'s Deepmind's new model) |
| − | </b><br> | + | </b><br>Deepmind's new model Gato is amazing! The first generalist RL agent using transformers! |
|} | |} | ||
|}<!-- B --> | |}<!-- B --> | ||
Revision as of 05:44, 24 May 2022
YouTube search... ...Google search
- Google's Tools and Resources
- A Generalist Agent | DeepMind
- Deepmind's new model Gato is amazing!
DeepMind's “generalist” AI model inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens.
|
|
|
|