The next generation of the popular open source AI image generation model, Stable Diffusion 3, has been announced by StabilityAI, and it is an impressive leap forward
Details of the new model were revealed along with a series of images and prompts showing that it can follow complex instructions and create ultra-realistic images
An early preview of the model will only be available to a select group of testers while StabilityAI gathers feedback to improve performance and safety prior to public release
StabilityAI also used Spawning's "Do Not Train" registry to ensure that images from artists who did not want their work to be used for AI training were excluded Prior to training, over 15 billion images were filtered from the dataset
Unlike DALL-E, MidJourney, and Google's Imagen Stable Diffusion, the model is open and can be integrated into other platforms or run locally if sufficient computing power is available
SD3 will include a complete set of models with between 800 million and 8 billion parameters, allowing it to run on a variety of quality levels and a wide range of hardware devices
Like OpenAI's Sora, Stable Diffusion 3 combines diffusion model technology with a transformer architecture, which can account for its improved instruction following capabilities
It also uses flow matching, a mathematical technique used to learn diffusion models, which measures the differences between real-world and generated images at various stages of the process
Few people outside of the development team have had direct access to Stable Diffusion 3 yet, and no research papers have yet been published
From what we have seen so far, this is an important step change in the generated image This, along with OpenAI's Sora, represents a major upgrade in the way generative AI works and its capabilities
It creates consistent, enhanced, readable text on the image, solves problems with human anatomy, including fingers, and appears to capture color well
Emad Mostaque, founder of StabilityAI, said that StabilityAI has 100 times fewer resources to train AI models than something like OpenAI, but still accomplishes impressive work He suggested that, like Sora, SD3 can accept a variety of inputs, including video and images
Details of SD3 were released a few days after StabilityAI announced Stable Cascade
Comments