Skip to content

[Bug] Flux.2-Klein VAE decoding fails silently on OOM (Vulkan), outputs gray image with "success" log #1220

@LWWZH

Description

@LWWZH

Git commit

a48b4a3

Operating System & Version

Windows 11 25H2

GGML backends

Vulkan

Command-line arguments used

.\sd-cli.exe --diffusion-model ..\Models\flux-2-klein-4b-Q4_K_S.gguf --vae ..\Models\flux-2-klein-4b-vae.safetensors --llm ..\Models\Qwen3-4B-Q4_K_S.gguf --cfg-scale 1.0 -p "A picture of a beach." --diffusion-fa -H 1024 -W 1024 --steps 4 -s -1 -v

Steps to reproduce

  1. Download FLUX.2-klein-4B (safetensors or gguf), flux2_ae.safetensors VAE, and Qwen3-4B text encoder.
  2. Run sd-cli with parameters: -H 1024 -W 1024, --cfg-scale 1.0, --steps 4.
  3. Use a GPU with limited VRAM (approx 8GB or less available for Vulkan backend).
  4. Wait for the generation process to complete.

What you expected to happen

The program should either:

  1. Exit immediately with a clear error message (e.g., "Error: VAE decode failed due to OOM. Try lowering resolution.") when memory allocation fails.
  2. Or fail fast earlier in the pipeline if the estimated VRAM requirement exceeds the device limit.

It should not output a pure gray image and log (success) after a critical internal failure.

What actually happened

The generation process ran for ~27 minutes (on my end). The sampling steps completed successfully. However, during the VAE decoding stage, an ErrorOutOfDeviceMemory occurred.

Despite this critical error, the program did not crash or exit. Instead, it saved a pure gray image (output.png) to the disk and printed save result image ... (success) in the logs. This is misleading and wastes the user's time, as the user believes the generation succeeded until inspecting the output file.

Logs / error messages / stack trace

[DEBUG] llm.hpp:203  - token length: 512
[DEBUG] ggml_extend.hpp:1734 - qwen3 compute buffer size: 75.00 MB(VRAM)
[DEBUG] conditioner.hpp:1923 - computing condition graph completed, taking 3360 ms
[INFO ] stable-diffusion.cpp:3250 - get_learned_condition completed, taking 3379 ms
[INFO ] stable-diffusion.cpp:3361 - generating image: 1/1 - seed 4700
[DEBUG] ggml_extend.hpp:1734 - flux compute buffer size: 1221.50 MB(VRAM)
  |==================================================| 4/4 - 412.33s/it
[INFO ] stable-diffusion.cpp:3403 - sampling completed, taking 1650.03s
[INFO ] stable-diffusion.cpp:3414 - generating 1 latent images completed, taking 1651.04s
[INFO ] stable-diffusion.cpp:3417 - decoding 1 latents
ggml_vulkan: Device memory allocation of size 4831838208 failed.
ggml_vulkan: Requested buffer size exceeds device buffer size limit: ErrorOutOfDeviceMemory
[ERROR] ggml_extend.hpp:84   - ggml_gallocr_reserve_n_impl: failed to allocate Vulkan0 buffer of size 8553234560
[ERROR] ggml_extend.hpp:1724 - vae: failed to allocate compute buffer
[ERROR] ggml_extend.hpp:1996 - vae alloc compute buffer failed
[DEBUG] stable-diffusion.cpp:2684 - computing vae decode graph completed, taking 0.93s
[INFO ] stable-diffusion.cpp:3427 - latent 1 decoded, taking 0.93s
[INFO ] stable-diffusion.cpp:3431 - decode_first_stage completed, taking 0.94s
[INFO ] stable-diffusion.cpp:3741 - generate_image completed in 1655.37s
[INFO ] main.cpp:421  - save result image 0 to 'output.png' (success)

Additional context / environment details

  • GPU Model: Intel Iris Xe Graphics
  • VRAM: 8GB
  • Model: Flux.2 [Klein] 4B (GGUF Q4 K S), Qwen3 4B (GGUF Q4 K S)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions