-
Notifications
You must be signed in to change notification settings - Fork 792
Add C++ static runner for CoreML #16463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16463
Note: Links to docs will display an error until the docs builds have been completed. ❌ 6 New Failures, 3 Unrelated FailuresAs of commit ff8ae0b with merge base 913436a ( NEW FAILURES - The following jobs have failed:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
|
@larryliu0820 I added a static LLM runner based on the APIs in extension/llm. Can you have a look and give feedback? |
|
@JacobSzwejbka are you able to review this while Mengwei is out? Or is there a better person? |
Summary
Add a C++ runner for static attention CoreML LLM models exported with
export_static_llm_coreml.py. This runner:TextDecoderRunnerfromexecutorch/extension/llm/runner/text_decoder_runner.hStaticAttentionIOManagerfromexecutorch/examples/models/llama/runner/static_attention_io_manager.hfor KV cache managementNew files:
examples/apple/coreml/llama/runner/static_llm_runner.h- Runner header withStaticLLMConfig,StaticLLMIOManager,StaticLLMTextDecoderRunner, andStaticLLMRunnerclassesexamples/apple/coreml/llama/runner/static_llm_runner.cpp- Runner implementationexamples/apple/coreml/llama/runner/main.cpp- CLI entry point with gflagsexamples/apple/coreml/llama/runner/CMakeLists.txt- CMake build configurationexamples/apple/coreml/llama/runner/build_and_run.sh- Build and run helper scriptModified files:
CMakeLists.txt- Add subdirectory for static LLM CoreML runner (whenEXECUTORCH_BUILD_EXTENSION_LLM_RUNNER,EXECUTORCH_BUILD_COREML, andAPPLEare enabled)examples/apple/coreml/llama/export_static_llm_coreml.py- Add--cpu_onlyflag for CI testing (ANE not accessible in CI) and--no_generate_full_logitsflag for more efficient models.ci/scripts/test_ane_static_llama.sh- Build and test the C++ runner in CIKnown issue: Lookahead decoding currently produces incorrect output (
<unk>tokens) for stories110M, but does work for llama1B. This will be addressed in a follow-up PR.Test plan
CI script
.ci/scripts/test_ane_static_llama.shtests: