Generative Query Network (GQN)

From
Revision as of 15:53, 28 April 2024 by BPeat (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

YouTube search... ...Google search


A framework within which machines learn to perceive their surroundings by training only on data obtained by themselves as they move around scenes. Much like infants and animals, the GQN learns by trying to make sense of its observations of the world around it. In doing so, the GQN learns about plausible scenes and their geometrical properties, without any human labelling of the contents of scenes. The GQN model is composed of two parts: a representation network and a generation network. The representation network takes the agent's observations as its input and produces a representation (a vector) which describes the underlying scene. The generation network then predicts (‘imagines’) the scene from a previously unobserved viewpoint. Neural Scene Representation and Rendering | Science Reprint

One half of the system is its “representation” part, which can observe a given 3D scene from some angle, encoding it in a complex mathematical form called a vector. Then there’s the “generative” part, which, based only on the vectors created earlier, predicts what a different part of the scene would look like. “It was not at all clear that a neural network could ever learn to create images in such a precise and controlled manner,” said lead author of the paper, Ali Eslami, in a release accompanying the paper. “However we found that sufficiently deep networks can learn about perspective, occlusion and lighting, without any human engineering. This was a super surprising finding.” It also allows the system to accurately recreate a 3D object from a single viewpoint, such as the blocks shown here: AI edges closer to understanding 3D space the way we do | BIG4ALL.ORG

gif_2.gif