SSD-1B: Segmind’s accelerated stable diffusion model

The Segmind Stable Diffusion Model (SSD-1B) is a 50% smaller version of Stable Diffusion XL (SDXL), delivering 60% speedup while maintaining high-quality text-to-image generation capabilities. It has been trained on various data sets, including Grit and Midjourney scrap data, to enhance its ability to create a variety of visual content based on textual prompts.

The SSD-1B model uses a knowledge distillation strategy to continuously leverage the teachings of multiple expert models (including SDXL, ZavyChromaXL and JuggernautXL), combining their strengths and producing impressive visual output.

Image comparison (SDXL-1.0 vs. SSD-1B):

Recommended online tools: Three.js AI texture development kit – YOLO synthetic data generator – GLTF/GLB online editing – 3D model format online conversion – 3D scene editor

1. How to use SSD-1B

The model is available through the Diffusers library.

Make sure to install Diffusers from source by running:

pip install git + https://github.com/huggingface/diffusers

Additionally, please install transformers, safetensors and accelerate:

pip install transformers accelerate safetensors

To use the model, you can run the following command:

from diffusers import StableDiffusionXLPipeline
import torch
pipe = StableDiffusionXLPipeline.from_pretrained("segmind/SSD-1B", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
pipe.to("cuda")
# if using torch < 2.0
# pipe.enable_xformers_memory_efficient_attention()
prompt = "An astronaut riding a green horse" # Your prompt here
neg_prompt = "ugly, blurry, poor quality" # Negative prompt here
image = pipe(prompt=prompt, negative_prompt=neg_prompt).images[0]

The SSD-1B model should now be available in ComfyUI.

Be sure to use negative tips and a CFG around 9.0 for the best quality!

2. SSD-1B model description

Developer: Segmind
Developers: Yatharth Gupta and Vishnu Jaddipal.
Model type: Diffusion-based text-to-image generation model
License: Apache 2.0
Distilled from stableai/stable-diffusion-xl-base-1.0

The main features of SSD-1B are as follows:

Text-to-image generation: This model excels at generating images based on textual cues, enabling a wide range of creative applications.
Refined Speedup: Designed for efficiency, this model delivers 60% speedup, making it a practical choice for real-time applications and scenarios where images need to be generated quickly.
Diverse training data: The model is trained on a diverse data set to handle various textual prompts and effectively generate corresponding images.
Knowledge Distillation: By distilling knowledge from multiple expert models, the Segmind stable diffusion model combines their strengths and minimizes their limitations, thereby improving performance.

3. SSD-1B model architecture

The SSD-1B model is a 1.3B parametric model with several layers removed from the base SDXL model:

4. Multi-resolution support

SSD-1B supports the following output resolutions.

1024 x 1024 (1:1 square)
1152 x 896 (9:7)
896 x 1152 (7:9)
1216 x 832 (19:13)
832 x 1216 (13:19)
1344 x 768 (7:4 horizontal)
768 x 1344 (4:7 vertical)
1536 x 640 (12:5 horizontal)
640 x 1536 (5:12 vertical)

5. SSD-1B speed comparison

We observed that the SSD-1B is 60% faster than the Base SDXL model. Here’s how the A100 80GB compares.

Here are the acceleration metrics for the RTX 4090 GPU:

6. Potential uses of SSD-1B

The SSD-1B model is not suitable for creating factual or accurate representations of people, events, or real-world information. It is not suitable for tasks requiring high precision and accuracy.

Use directly. Segmind stable diffusion models are suitable for research and practical applications in various fields, including:

Art & Design: It can be used to generate artwork, designs and other creative content, provide inspiration and enhance the creative process.
Education: This model can be applied to educational tools to create visual content for teaching and learning purposes.
Research: Researchers can use this model to explore generative models, evaluate their performance, and push the boundaries of text-to-image generation.
Secure content generation: It provides a safe and controlled way to generate content, reducing the risk of harmful or inappropriate output.
Bias and Limitation Analysis: Researchers and developers can use the model to explore its limitations and biases, helping to better understand the behavior of the generated model.

Downstream use. The Segmind stable diffusion model can also be used directly with the Diffusers library training scripts for further training, including:

Fine-tuning:

export MODEL_NAME="segmind/SSD-1B"
export VAE_NAME="madebyollin/sdxl-vae-fp16-fix"
export DATASET_NAME="lambdalabs/pokemon-blip-captions"

accelerate launch train_text_to_image_lora_sdxl.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --pretrained_vae_model_name_or_path=$VAE_NAME \
  --dataset_name=$DATASET_NAME --caption_column="text" \
  --resolution=1024 --random_flip \
  --train_batch_size=1 \
  --num_train_epochs=2 --checkpointing_steps=500 \
  --learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0 \
  --mixed_precision="fp16" \
  --seed=42 \
  --output_dir="sd-pokemon-model-lora-sdxl" \
  --validation_prompt="cute dragon creature" --report_to="wandb" \
  --push_to_hub

LoRA:

export MODEL_NAME="segmind/SSD-1B"
export VAE_NAME="madebyollin/sdxl-vae-fp16-fix"
export DATASET_NAME="lambdalabs/pokemon-blip-captions"

accelerate launch train_text_to_image_sdxl.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --pretrained_vae_model_name_or_path=$VAE_NAME \
  --dataset_name=$DATASET_NAME \
  --enable_xformers_memory_efficient_attention \
  --resolution=512 --center_crop --random_flip \
  --proportion_empty_prompts=0.2 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 --gradient_checkpointing \
  --max_train_steps=10000 \
  --use_8bit_adam \
  --learning_rate=1e-06 --lr_scheduler="constant" --lr_warmup_steps=0 \
  --mixed_precision="fp16" \
  --report_to="wandb" \
  --validation_prompt="a cute Sundar Pichai creature" --validation_epochs 5 \
  --checkpointing_steps=5000 \
  --output_dir="sdxl-pokemon-model" \
  --push_to_hub

Dreambooth LoRA:

export MODEL_NAME="segmind/SSD-1B"
export INSTANCE_DIR="dog"
export OUTPUT_DIR="lora-trained-xl"
export VAE_PATH="madebyollin/sdxl-vae-fp16-fix"

accelerate launch train_dreambooth_lora_sdxl.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --instance_data_dir=$INSTANCE_DIR \
  --pretrained_vae_model_name_or_path=$VAE_PATH \
  --output_dir=$OUTPUT_DIR \
  --mixed_precision="fp16" \
  --instance_prompt="a photo of sks dog" \
  --resolution=1024 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --learning_rate=1e-5 \
  --report_to="wandb" \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=500 \
  --validation_prompt="A photo of sks dog in a bucket" \
  --validation_epochs=25 \
  --seed="0" \
  --push_to_hub

Original link: Segmind SSD-1B – BimAnt