This artificial intelligence model can understand how the physical world works


The original version from This story appeared in Quanta Magazine.

Here’s a test for babies: Show them a glass of water on the table. Hide it behind a wooden board. Now move the board towards the glass. Would they be surprised if the board kept passing by the glass as if it wasn’t there? Many children by 6 months, and by about 1 year, almost all children have an intuitive idea of ​​the permanence of an object, which is learned through observation. Now some AI models do this too.

Researchers have developed an artificial intelligence system that learns about the world through videos and displays the concept of “surprise” when presented with information that contradicts the accumulated knowledge.

This model, developed by META and named Joint Video Embedding Predictive Architecture (V-JEPA), makes no assumptions about the physics of the world in the videos. Nevertheless, one can begin to understand how the world works.

“Their claims are already very plausible, and the results are very interesting,” says Mika Heilbron, a cognitive scientist at the University of Amsterdam who studies how brains and artificial systems understand the world.

Great abstractions

As engineers building self-driving cars know, getting an AI system to reliably understand what it sees can be difficult. Most systems designed to “understand” videos to classify their content (e.g., “a person playing tennis”) or detect the contours of an object (e.g., a car in front) operate in what is called “pixel space.” This model essentially treats every pixel in a video as equally important.

But these pixel space models have limitations. Imagine trying to make sense of a suburban street. If the scene contains cars, traffic lights, and trees, the model may focus too much on irrelevant details such as the movement of leaves. It may miss the color of traffic lights or the position of nearby vehicles. When it comes to images or video, you don’t want to work in it [pixel] Space, because there’s a lot of detail that you don’t want to model,” said Randall Balstereiro, a computer scientist at Brown University.

Image may contain Yann LeCun happy face with person's head smile adult portrait photography dent and accessories

Yann LeCun, a computer scientist at New York University and director of artificial intelligence research at Meta, created JEPA, a predecessor of V-JEPA that works on still images, in 2022.

Photo: École Polytechnique Université Paris-Saclay

Leave a Reply

Your email address will not be published. Required fields are marked *