Skip to content

Conversation

@metascroy
Copy link
Contributor

Summary

Add a C++ runner for static attention CoreML LLM models exported with export_static_llm_coreml.py. This runner:

  • Extends TextDecoderRunner from executorch/extension/llm/runner/text_decoder_runner.h
  • Uses existing StaticAttentionIOManager from executorch/examples/models/llama/runner/static_attention_io_manager.h for KV cache management
  • Auto-detects model configuration (input_len, cache_len, n_layers, n_kv_heads, head_dim, vocab_size, generate_full_logits) from model metadata
  • Supports both regular greedy decoding and lookahead (speculative) decoding
  • Processes prompts in chunks during prefill

New files:

  • examples/apple/coreml/llama/runner/static_llm_runner.h - Runner header with StaticLLMConfig, StaticLLMIOManager, StaticLLMTextDecoderRunner, and StaticLLMRunner classes
  • examples/apple/coreml/llama/runner/static_llm_runner.cpp - Runner implementation
  • examples/apple/coreml/llama/runner/main.cpp - CLI entry point with gflags
  • examples/apple/coreml/llama/runner/CMakeLists.txt - CMake build configuration
  • examples/apple/coreml/llama/runner/build_and_run.sh - Build and run helper script

Modified files:

  • CMakeLists.txt - Add subdirectory for static LLM CoreML runner (when EXECUTORCH_BUILD_EXTENSION_LLM_RUNNER, EXECUTORCH_BUILD_COREML, and APPLE are enabled)
  • examples/apple/coreml/llama/export_static_llm_coreml.py - Add --cpu_only flag for CI testing (ANE not accessible in CI) and --no_generate_full_logits flag for more efficient models
  • .ci/scripts/test_ane_static_llama.sh - Build and test the C++ runner in CI

Known issue: Lookahead decoding currently produces incorrect output (<unk> tokens) for stories110M, but does work for llama1B. This will be addressed in a follow-up PR.

Test plan

CI script .ci/scripts/test_ane_static_llama.sh tests:

  1. Export static ANE model and CPU-only model
  2. Build C++ runner with CMake/Ninja
  3. Run regular decoding and validate output contains expected prefix "Once upon a time, there was"
  4. Run lookahead decoding (runs without crashing, but output is incorrect - known issue)

@pytorch-bot
Copy link

pytorch-bot bot commented Jan 6, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16463

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures, 3 Unrelated Failures

As of commit ff8ae0b with merge base 913436a (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 6, 2026
@github-actions
Copy link

github-actions bot commented Jan 6, 2026

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@metascroy
Copy link
Contributor Author

@larryliu0820 I added a static LLM runner based on the APIs in extension/llm. Can you have a look and give feedback?

@metascroy
Copy link
Contributor Author

@JacobSzwejbka are you able to review this while Mengwei is out? Or is there a better person?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants