Hi gemma.cpp folks,
I wanted to share a small but unusual language-runtime project that may be relevant to the kind of language/runtime co-design boundary this repo is already exploring.
We built a public demo line called Engram and deployed it on a commodity ESP32-C3.
Current public numbers:
Important scope note:
This is not presented as unrestricted open-input native LLM generation on MCU.
The board-side path is closer to a flash-resident, table-driven runtime with:
- packed token weights
- hashed lookup structures
- fixed compiled probe batches
- streaming fold / checksum style execution over precompiled structures
So this is not a standard lightweight dense inference runtime on a small device. It is closer to a task-specialized language runtime whose behavior has been crystallized into a compact executable form under very severe physical constraints.
Repo:
https://github.com/Alpha-Guardian/Engram
Why I’m posting here is that gemma.cpp seems to sit at an interesting point between research-friendly implementation, low-level execution, and simplified language inference systems.
What I’d be curious about is whether systems like this should be thought of as:
- outside the normal lightweight inference-runtime family
- an extreme endpoint of language/runtime co-design
- or an early sign that some language-task capability may eventually be deployed in more specialized executable forms than even a minimalist dense runtime
If this direction is relevant to your team, I’d be glad to compare notes.