Difference between revisions of "PaLM"

From
Jump to: navigation, search
m
m
Line 24: Line 24:
 
* [[Foundation Models (FM)]]
 
* [[Foundation Models (FM)]]
 
* [[Singularity]] ... [[Moonshots]] ... [[Emergence]] ... [[Explainable / Interpretable AI]] ... [[Artificial General Intelligence (AGI)| AGI]] ... [[Inside Out - Curious Optimistic Reasoning]] ... [[Algorithm Administration#Automated Learning|Automated Learning]]
 
* [[Singularity]] ... [[Moonshots]] ... [[Emergence]] ... [[Explainable / Interpretable AI]] ... [[Artificial General Intelligence (AGI)| AGI]] ... [[Inside Out - Curious Optimistic Reasoning]] ... [[Algorithm Administration#Automated Learning|Automated Learning]]
 +
* [https://ai.googleblog.com/2023/03/palm-e-embodied-multimodal-language.html PaLM-E: An embodied multimodal language model]
 +
* [https://www.boteatbrain.com/p/google-palm-e PaLM-E, Google's smartest new bot | Anthony Castrio - Bot Eat Brain]
 
   
 
   
  
 
An Embodied Multimodal Language Model that directly incorporates real-world continuous sensor modalities into language models and thereby establishes the link between words and percepts. It was developed by Google to be a model for robotics and can solve a variety of tasks on multiple types of robots and for multiple modalities (images, robot states, and neural scene representations). PaLM-E is also a generally-capable vision-and-language model. It can perform visual tasks, such as describing images, detecting objects, or classifying scenes, and is also proficient at language tasks, like quoting poetry, solving math equations or generating code. 562B
 
An Embodied Multimodal Language Model that directly incorporates real-world continuous sensor modalities into language models and thereby establishes the link between words and percepts. It was developed by Google to be a model for robotics and can solve a variety of tasks on multiple types of robots and for multiple modalities (images, robot states, and neural scene representations). PaLM-E is also a generally-capable vision-and-language model. It can perform visual tasks, such as describing images, detecting objects, or classifying scenes, and is also proficient at language tasks, like quoting poetry, solving math equations or generating code. 562B
  
 +
<img src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/4f4f782c-a179-40cf-9785-923e8be0cfc2/palm_e_demo.gif" width="800">
  
https://palm-e.github.io/videos/palm-e-teaser.mp4
+
 
 +
<youtube>fiLFF4RyyKQ</youtube>
 +
<youtube>gD5rz8e9EQ8</youtube>

Revision as of 09:42, 29 April 2023

YouTube ... Quora ...Google search ...Google News ...Bing News


An Embodied Multimodal Language Model that directly incorporates real-world continuous sensor modalities into language models and thereby establishes the link between words and percepts. It was developed by Google to be a model for robotics and can solve a variety of tasks on multiple types of robots and for multiple modalities (images, robot states, and neural scene representations). PaLM-E is also a generally-capable vision-and-language model. It can perform visual tasks, such as describing images, detecting objects, or classifying scenes, and is also proficient at language tasks, like quoting poetry, solving math equations or generating code. 562B