-
Notifications
You must be signed in to change notification settings - Fork 512
Description
Git commit
Operating System & Version
Windows 11 25H2
GGML backends
Vulkan
Command-line arguments used
.\sd-cli.exe --diffusion-model ..\Models\flux-2-klein-4b-Q4_K_S.gguf --vae ..\Models\flux-2-klein-4b-vae.safetensors --llm ..\Models\Qwen3-4B-Q4_K_S.gguf --cfg-scale 1.0 -p "A picture of a beach." --diffusion-fa -H 1024 -W 1024 --steps 4 -s -1 -v
Steps to reproduce
- Download
FLUX.2-klein-4B(safetensors or gguf),flux2_ae.safetensorsVAE, andQwen3-4Btext encoder. - Run
sd-cliwith parameters:-H 1024 -W 1024,--cfg-scale 1.0,--steps 4. - Use a GPU with limited VRAM (approx 8GB or less available for Vulkan backend).
- Wait for the generation process to complete.
What you expected to happen
The program should either:
- Exit immediately with a clear error message (e.g., "Error: VAE decode failed due to OOM. Try lowering resolution.") when memory allocation fails.
- Or fail fast earlier in the pipeline if the estimated VRAM requirement exceeds the device limit.
It should not output a pure gray image and log (success) after a critical internal failure.
What actually happened
The generation process ran for ~27 minutes (on my end). The sampling steps completed successfully. However, during the VAE decoding stage, an ErrorOutOfDeviceMemory occurred.
Despite this critical error, the program did not crash or exit. Instead, it saved a pure gray image (output.png) to the disk and printed save result image ... (success) in the logs. This is misleading and wastes the user's time, as the user believes the generation succeeded until inspecting the output file.
Logs / error messages / stack trace
[DEBUG] llm.hpp:203 - token length: 512
[DEBUG] ggml_extend.hpp:1734 - qwen3 compute buffer size: 75.00 MB(VRAM)
[DEBUG] conditioner.hpp:1923 - computing condition graph completed, taking 3360 ms
[INFO ] stable-diffusion.cpp:3250 - get_learned_condition completed, taking 3379 ms
[INFO ] stable-diffusion.cpp:3361 - generating image: 1/1 - seed 4700
[DEBUG] ggml_extend.hpp:1734 - flux compute buffer size: 1221.50 MB(VRAM)
|==================================================| 4/4 - 412.33s/it
[INFO ] stable-diffusion.cpp:3403 - sampling completed, taking 1650.03s
[INFO ] stable-diffusion.cpp:3414 - generating 1 latent images completed, taking 1651.04s
[INFO ] stable-diffusion.cpp:3417 - decoding 1 latents
ggml_vulkan: Device memory allocation of size 4831838208 failed.
ggml_vulkan: Requested buffer size exceeds device buffer size limit: ErrorOutOfDeviceMemory
[ERROR] ggml_extend.hpp:84 - ggml_gallocr_reserve_n_impl: failed to allocate Vulkan0 buffer of size 8553234560
[ERROR] ggml_extend.hpp:1724 - vae: failed to allocate compute buffer
[ERROR] ggml_extend.hpp:1996 - vae alloc compute buffer failed
[DEBUG] stable-diffusion.cpp:2684 - computing vae decode graph completed, taking 0.93s
[INFO ] stable-diffusion.cpp:3427 - latent 1 decoded, taking 0.93s
[INFO ] stable-diffusion.cpp:3431 - decode_first_stage completed, taking 0.94s
[INFO ] stable-diffusion.cpp:3741 - generate_image completed in 1655.37s
[INFO ] main.cpp:421 - save result image 0 to 'output.png' (success)
Additional context / environment details
- GPU Model: Intel Iris Xe Graphics
- VRAM: 8GB
- Model: Flux.2 [Klein] 4B (GGUF Q4 K S), Qwen3 4B (GGUF Q4 K S)