In a stride toward real-time image generation, Stability AI has unveiled Stable Diffusion XL Turbo (SDXL Turbo), an advanced AI image-synthesis model capable of swiftly transforming written prompts into visuals. Remarkably, this model extends its agility to quickly process images from sources like webcams, positioning itself as a frontrunner in the domain of immediate image generation.
The standout feature of SDXL Turbo lies in its streamlined process, generating image outputs in a single step—a quantum leap from the 20–50 steps mandated by its precursor. The driving force behind this efficiency surge is Stability’s innovative technique, Adversarial Diffusion Distillation (ADD). This method leverages score distillation, enabling the model to glean insights from existing image-synthesis models. Additionally, the infusion of adversarial loss sharpens the model’s ability to discern between authentic and generated images, augmenting the realism of the output.
The inner workings of SDXL Turbo, elucidated in a recently released research paper, emphasize the prowess of the ADD technique. While the images generated by SDXL Turbo may lack the intricate details of higher-step SDXL images, it emerges as a speed-focused complement rather than a replacement for its predecessor.
To put its rapid capabilities to the test, SDXL Turbo was run locally on an Nvidia RTX 3060, exhibiting the generation of a 3-step 1024×1024 image in just about 4 seconds—vastly outperforming the 26.4 seconds required for a comparable 20-step SDXL image. Notably, smaller images display even swifter generation times, showcasing the model’s adaptability to varying specifications.
The real-time claim gains credence as Stability AI asserts that, on an Nvidia A100—a potent AI-tuned GPU—SDXL Turbo can generate a 512×512 image in a mere 207 milliseconds, encompassing encoding, a single de-noising step, and decoding. This unprecedented speed opens avenues for applications like real-time generative AI video filters and experimental video game graphics generation, contingent on resolving coherency challenges, such as maintaining consistency across frames or generations.
While SDXL Turbo is presently accessible under a non-commercial research license, which confines usage to personal, non-commercial objectives, Stability AI has expressed receptivity to potential commercial applications. Despite internal management challenges, including calls for CEO Emad Mostaque’s resignation and contemplation of a company sale, Stability AI continues its prolific release schedule, recently unveiling Stable Video Diffusion.
For a firsthand experience of SDXL Turbo’s capabilities, Stability AI provides a beta demonstration on its image-editing platform, Clipdrop. An unofficial live demo on Hugging Face further allows enthusiasts to explore this cutting-edge technology. As with any technological advancement, the lingering concerns about training data provenance and misuse caution users to tread carefully in the realm of AI image synthesis. Yet, the relentless progress in this domain remains undeniable.