RoskerTech

The Holodeck is Here - New AI 3D Video Models Create Animations of Any Object

General

StabilityAI has released a new artificially intelligent 3D video model that can turn a simple image prompt into a fully animated view of any object or series of objects

Built on top of the open source Stable Video Diffusion model, it is widely used for AI video generation by companies like Leonardo AI and StabilityAI itself

Stable Video 3D (SV3D) adds new depth to video generation, creating multi-view 3D meshes from a single image while maintaining higher consistency for objects in the video frame

Emad Mostaque, founder and CEO of StabilityAI, writes in X: "The 3D mesh is a new way of creating a 3D mesh All pixels are generated

Stable Video 3D builds on technology pioneered in previous models such as Stable Video Diffusion, the original Stable Diffusion, and the Zero123 3D image model that StabilityAI released late last year

At the time, Mostaque said this was just the first in a series of 3D models to be released by AI Labs, but they seem to be on a mission to make the Star Trek holodeck a reality

There are two variants of the new model: the first is SV3D_u, which creates an orbital movie based on a single image input, without specific camera specifications

The second, SV3D_p, builds on the functionality of the first, allowing for a single image and orbit view, leading to the creation of a 3D video "captured" along a specified camera path

Basically, it analyzes a given image and creates multiple views of the object from different angles as if the camera were moving around that object, which then becomes a video

I have not yet tried SVD 3D, but from the sample clips, it seems to do a good job of capturing the object and predicting unseen views as well as camera movement

So far, all clips have focused on a single object on a white background While this may prove useful for companies that want to easily include a full 360-degree view on their products, the authenticity of the reverse view is questionable since it is predicted, not real

It will be interesting to see how it evolves to handle more complex images and whether the camera controls can be applied to complete scenes, such as two people talking or a car spinning around on the road

Some of the motion depiction techniques could be extended and applied to generative AI video to provide a higher degree of control over how the camera moves in a clip

They could also be used to create 3D videos of interactive objects or objects in virtual environments such as Meta Quest or Apple Vision Pro

The source of training data is a particularly important topic that many large AI labs are reluctant to discuss This includes OpenAI, which is debating over whether YouTube videos are part of Sora AI's video dataset

StabilityAI has been open about the source of training data for its latest models, explaining that they are trained on a curated subset of the Objaverse dataset This is a library of millions of annotated 3D objects used by many AI 3D services

"We selected a curated subset of the Objaverse dataset as our training data

The license allows end users to share, adapt, and remix the material in any way they like, commercial or non-commercial, as long as they show credit and a link to the license