RoskerTech

Trying a new approach to image generation for StabilityAI - Meet Stable Cascade

General

StabilityAI, an artificial intelligence company, has introduced Stable Cascade, a next-generation AI image model that can generate photorealistic images from text and images and is much faster than previous generation models

Stable Cascade differs from previous diffusion models such as Stable Diffusion It works by building three different models and creating a cascade of images, creating space to improve output and facilitate fine-tuning as it passes through each

Testing this model, one can see an image forming in front of the eye from the prompt, with pixels and shapes converging until it is sharp to full resolution

One of the biggest selling points of this new model over the previous stable diffusion model is its ability to create accurate and realistic text on the image However, in my limited testing, like other AI image text tools, it was hit or miss

This is what MidJourney achieved earlier this year with version 6 and what OpenAI achieved last year with DALL-E 3 Google can also create image text with Imagen 2, but they all have similar consistency issues

The most important feature seems to be flexibility in training and fine-tuning, making it ideal for companies that want to adapt models to their own style or train on licensed or restricted image libraries

It is built on a new architecture called the Würstchen architecture This has competitive performance at scale and allows for cascading effects, while taking into account the need for cost-effectiveness

StabilityAI is focused on open source, making models and weights available to the public under a non-commercial license for retraining, offline use, and customization

The company has also participated in the development of Stable Diffusion and its related models, stating that the new models are "very easy to train and fine-tune on consumer hardware"

In addition, the training and inference code is available on Stability's GitHub page for further customization of the model and its output The model can be used for inference with the diffusers library"

I did not have much time to play with Stable Cascade, but the images I generated using the Hugging Face space and the images I saw shared were of impressive quality considering the speed of generation

I don't often have to wait long to access images generated by MidJourney or DALL-E, but it is clearly longer than the time it takes to generate images with Stable Cascade It is similar to the real-time generation of SDXL Turbo, which is also StabilityAI, but with higher resolution

In my limited experimentation, text generation was comparable to DALL-E and MidJourney, but Stable Cascade made more mistakes

It is important to note that this is a model designed for fine-tuning and further training It will come into its own when third-party platforms or Stable Cascade is eventually introduced to StabilityAI's Clipdrop image generation platform

Stable Cascade can be tried in the Hugging Face space, but access is dependent on how busy it is at the time I have found that I rarely have to wait more than a few seconds to access the GPU to run a model

A non-commercial version of Stable Cascade can be downloaded and installed on a laptop, but this requires a large GPU and a large amount of RAM; one-click installers for Windows and Mac are available in the Pinokio app

Third-party sites such as Leonardo and Night Cafe will likely introduce versions of Stable Cascade in the future