ARM I2_S inference produces gibberish/garbage output after commit 112f853 (CPU Optimization update)

### Description

After the CPU Inference Optimization update (commit `112f853`), running inference with `i2_s` quantization on ARM (aarch64) produces completely incoherent output — random tokens with no relation to the prompt. Rolling back to commit `404980e` (the last commit before the optimization merge) restores correct, coherent output.

### Environment

- **Hardware:** Raspberry Pi 5 (8GB RAM), ARM Cortex-A76 (aarch64)
- **OS:** Raspberry Pi OS 64-bit (Debian 12 Bookworm)
- **Compiler:** Debian clang version 18.1.8
- **CMake:** 3.25.1
- **Python:** 3.9 (conda)
- **Model:** `microsoft/BitNet-b1.58-2B-4T-gguf` (ggml-model-i2_s.gguf)
- **Quantization:** i2_s

### Steps to Reproduce

1. Clone repo at current HEAD (`01eb415`):
   ```
   git clone --recursive https://github.com/microsoft/BitNet.git
   cd BitNet
   ```

2. Generate kernels and build (following Adafruit guide):
   ```
   python utils/codegen_tl1.py --model bitnet_b1_58-3B --BM 160,320,320 --BK 64,128,64 --bm 32,64,32
   export CC=clang-18 CXX=clang++-18
   rm -rf build && mkdir build && cd build
   cmake .. -DCMAKE_BUILD_TYPE=Release
   make -j$(nproc)
   cd ..
   ```

3. Download model:
   ```
   huggingface-cli download microsoft/BitNet-b1.58-2B-4T-gguf --local-dir models/BitNet-b1.58-2B-4T
   ```

4. Run inference:
   ```
   python run_inference.py -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf -p "You are a helpful assistant" -t 4 -cnv
   ```

### Broken Output (HEAD - `01eb415`)

```
> hi how are you
ri differentorefFly increase Hurtutar run following section underestimateAD Sachs weighedision
cann RICTS Reyn taskfir-ra mark filtr castWATCHB fr ret flatten missionuche purchase parameter
gramhit associatedyuraft runeded take compound sugar contrast unsubedom conveyuffanford...
```

### Working Output (commit `404980e`)

```
> hi
Hello! How can I assist you today?

> what is a raspi 5
The Raspberry Pi 5 is a next-generation model of the Raspberry Pi single-board computer series...
```

Performance on the working commit: **9.68 tokens/second** (4 threads, ARM NEON).

### Bisection

The regression was introduced in commit `112f853`:

```
112f853 [feat] I2S kernels for weight & activation parallel on Intel & ARM machine;
        [feat] I2S GEMV & GEMM(llama.cpp);
        [feat] quantize activation & dequantize embedding(llama.cpp);
        [fix] compile bug: cannot define __ARM_FEATURE_DOTPROD(llama.cpp)
```

The last known working commit is `404980e` (one commit before `112f853`).

### Notes

- The build completes without errors on both commits — the issue is runtime behavior, not compilation.
- `ggml-bitnet-mad.cpp` is compiled and linked in both cases.
- NEON is detected and enabled (`NEON = 1` in system_info output).
- DOTPROD detection: `GGML_COMPILER_SUPPORT_DOTPROD - Failed`, but `COMPILER_SUPPORTS_ARMV82_DOTPROD - Success`.
- This issue also appears to affect other ARM64 platforms (Ampere/Hetzner CAX servers), not just Raspberry Pi.
- The [Adafruit BitNet on Raspberry Pi guide](https://learn.adafruit.com/local-llms-on-raspberry-pi/bitnet) (published Sept 2025, before the optimization commit) confirms working output on Pi 4 and Pi 5 with the older codebase.

Related to #411 — same root cause. Adding Pi 5 (Cortex-A76 with dotprod) as another confirmed affected platform.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARM I2_S inference produces gibberish/garbage output after commit 112f853 (CPU Optimization update) #470

Description

Environment

Steps to Reproduce

Broken Output (HEAD - `01eb415`)

Working Output (commit `404980e`)

Bisection

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ARM I2_S inference produces gibberish/garbage output after commit 112f853 (CPU Optimization update) #470

Description

Description

Environment

Steps to Reproduce

Broken Output (HEAD - 01eb415)

Working Output (commit 404980e)

Bisection

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Broken Output (HEAD - `01eb415`)

Working Output (commit `404980e`)