Google researchers have unveiled a new artificial intelligence model that can turn text prompts, sketches, and ideas into virtual worlds that can be interacted with and played with
Named "Genie," the virtual world model was trained on gameplay and other videos found online and is currently only a research preview The game is a 2D platformer rather than full VR
While it may still be far from a true holodeck like the one in Star Trek, it does suggest the possibility of one day being able to create a fully interactive adventure by simply entering a room and uttering a few words
In the AI world, saying something like opening Pandora's box or pulling a genie out of a lamp expresses the reality that content can be created with relatively little effort In reality, AI models require extensive training, just as it takes humans years to master a skill [You can't just rub a lamp and expect a genie to come out, you first need to fill the lamp with knowledge and ability, which in Genie's case came from "a large dataset of publicly available Internet videos" and a great deal of effort by engineers to create the code and weights for the model The result is a "large dataset of publicly available Internet videos
Tim Rocktäschel, Google DeepMind team leader at Genie, writes in X that the team focused on scale using a dataset consisting of over 200,000 hours of videos from 2D platforms
The videos were trained using unsupervised, unlabeled videos This allowed us to learn the character's various motions, controls, and actions in a consistent manner As a result, "our model can transform any image into a playable 2D world," Rocktechel explains
There are many tools on the market that can convert a graphic designer's mockup of a website or app into code
It is not necessarily the best code, but it creates a functional prototype that can be used There are also AI tools that create websites from text prompts
With Genie, you can basically give it a sketch on paper, perfectly crafted digital art, or an AI-generated depiction of the 2D world, and Genie will do the rest
It generates the images and other assets needed to turn the sketch into a fully realized open world, and predicts the next pixel frame based on the actions provided by the player
The creators used a tokenizer that compresses the video into individual tokens It is then sent to an action model, which encodes the transition between two frames as one of eight potential actions Another model is then used to predict future frames
The solution to integrate all this was the same breakthrough that OpenAI achieved with Sora
No release date has been set for Genie, and since it is a research project, it is unclear if it will ever become an actual product It is possible that someday we will be able to lift one of the best android phones and ask an assistant to make us a game of dodging vampires, but not for several years
More important are the underlying technologies and new approaches to content generation developed during its creation, such as learning without labels leading to open worlds
Rocktäschel blasted Sora on X, especially the idea that it is a "world model" He stated that while it is impressive and visually beautiful, "the world model needs 'action'" He added, "Genie is an action-controllable world model, but it is learned completely unsupervised from video"
Another major breakthrough brought about by Genie is a deeper understanding of real-world physics
Comments