Hyperrealistic AI character engine — identity, not roleplay.
Lettuce Engine makes LLMs be characters instead of playing them. There is no "act as" instruction, no roleplay framing. The entire system prompt is written as the character's own internal awareness — first-person declarative identity. The LLM receives a context that is the character's mental state: their memories, emotions, relationships, knowledge, and moment-to-moment experience of time.
The engine runs background loops that autonomously research characters via the web, generate layered memories, track emotional state across conversations, evolve per-user relationships, and validate every response for consistency. It works with any character — fictional, historical, or original — across multiple LLM backends.
The engine is organized into 10 subsystems that wire together through a central orchestrator (LettuceEngine). Entry points include a terminal CLI, a Discord bot, a REST + WebSocket API, and direct programmatic access.
Traditional AI roleplay tells the model "pretend to be X." Lettuce Engine never does this. Instead, the system prompt is a declarative identity document written entirely in first person:
I am Sherlock Holmes. I reside at 221B Baker Street, London.
I recall the curious case of the Speckled Band...
At this moment, I feel intellectually restless.
I am speaking with James. We have conversed three times previously.
No meta-layer. No "you are an AI assistant." The LLM simply receives a context that IS the character's inner world.
The system prompt is assembled dynamically per-request from 9 sections, each populated by a different subsystem (memory retrieval, emotion engine, relationship tracker, time awareness, etc.).
Every user message passes through a 6-stage pipeline:
-
Analyze — spaCy extracts entities and topics. DistilRoBERTa classifies the emotional content. Time-based emotion decay is applied.
-
Retrieve Context — Three retrieval signals run in parallel:
- Dense search (ChromaDB) — semantic similarity via sentence-transformer embeddings
- BM25 search — keyword matching over the full memory corpus
- Knowledge graph traversal — entities in the message are looked up in the character's NetworkX graph, and related memories are pulled via BFS
Results are merged via Reciprocal Rank Fusion (RRF) with configurable weights, then boosted by recency. A 5% random surfacing chance injects unexpected memories for naturalism.
-
Build Prompt — The
PromptAssemblerconstructs the 9-section first-person system prompt from the character definition, retrieved memories, current emotional state, relationship context, and time awareness. -
Generate — The assembled prompt + conversation history is sent to the configured LLM backend (Anthropic, OpenAI, OpenRouter, or Ollama). Streaming is supported via WebSocket.
-
Validate — A 4-signal consistency pipeline checks the response:
- Voice TF-IDF — is the vocabulary within the character's speech profile?
- Identity Anchor — any AI-speak tells? ("certainly", "great question", bullet points)
- NER Anachronism — entities from the wrong era?
- Knowledge Graph — contradictions with established facts?
If validation fails, the response is regenerated with explicit constraints (up to 2 retries).
-
Post-Response Updates — Conversation turns are stored in SQLite + ChromaDB. Entities are tracked. Relationship dimensions (trust, affection, respect) are updated based on exchange sentiment. The character's own emotional state updates from its response.
Memory is the most complex subsystem. Characters don't just retrieve static facts, they have a living, evolving memory that decays, consolidates, and grows over time.
Every memory exists in two places:
- ChromaDB — vector embeddings for semantic search (
all-MiniLM-L6-v2) - SQLite — metadata, importance scores, access counts, timestamps, tags
| Type | Description | Source |
|---|---|---|
episodic |
Specific events and experiences | Bootstrapper, conversations, research |
semantic |
Facts, knowledge, skills | Research, bootstrapper |
emotional |
Emotional reactions and feelings | Bootstrapper, conversation synthesis |
conversation |
Retained impressions from chats | Conversation synthesis loop |
The HybridRetriever combines three signals using Reciprocal Rank Fusion:
| Signal | Weight | What It Catches |
|---|---|---|
| Dense (ChromaDB) | 0.5 | Semantic similarity, paraphrases |
| BM25 (sparse) | 0.3 | Exact keyword matches that embeddings miss |
| Graph (NetworkX) | 0.2 | Entity relationships invisible to both |
- Creation — Memories are born from bootstrapping, conversation synthesis, or research
- Retrieval — Accessed memories get their
access_countincremented andlast_accessedupdated - Decay — Ebbinghaus forgetting curve:
importance *= e^(-lambda * hours). Frequently accessed memories decay slower (spaced repetition effect) - Consolidation — HDBSCAN clusters similar memories; low-importance duplicates are merged into the strongest memory
- Pruning — Memories that decay below the minimum threshold (0.05) are permanently deleted
Raw conversation turns are not directly useful as memories. Every 10 minutes, the synthesis loop:
- Fetches unsynthesized conversation turns since the last run
- Groups them by user
- Sends each conversation to the LLM with a synthesis prompt: "What do I actually remember from this conversation?"
- The LLM produces memories the way a real person would retain them — impressions, key facts, emotional reactions, not a verbatim log
- If nothing memorable happened, it outputs nothing (no memory pollution)
Emotions are 8-dimensional vectors based on Plutchik's wheel of emotions, not LLM-guessed strings.
| Dimension | Range | Valence |
|---|---|---|
| Joy | 0–1 | Positive |
| Trust | 0–1 | Positive |
| Fear | 0–1 | Negative |
| Surprise | 0–1 | Neutral |
| Sadness | 0–1 | Negative |
| Disgust | 0–1 | Negative |
| Anger | 0–1 | Negative |
| Anticipation | 0–1 | Positive |
- DistilRoBERTa (
j-hartmann/emotion-english-distilroberta-base) classifies the emotion in each user message locally — no LLM call - Classifier output is mapped to Plutchik dimensions
- Personality weights from the character's traits modify the response (an angry character amplifies anger signals)
- For simple shifts: vector math (blend, amplify) computes the new state instantly
- For complex shifts (betrayal, major revelations): LLM fallback for nuanced emotional reasoning
- Over time: exponential decay pulls emotions back toward the character's baseline
This approach handles ~90% of emotional updates with pure math — no API calls, no latency, deterministic results.
The emotional state produces human-readable descriptions for the prompt:
I feel a strong sense of anticipation mixed with mild unease.
It also exposes computed properties: primary_emotion, secondary_emotion, intensity (L2 norm), and valence (-1 to +1).
The engine runs several autonomous background tasks that keep the character alive between conversations:
| Loop | Default Interval | What It Does |
|---|---|---|
| Initial Research | Once on boot | Scrapes all research_seeds (Wikipedia, Fandom, web), extracts facts into knowledge graph, generates episodic memories |
| Conversation Synthesis | Every 10 min | Converts recent conversation turns into first-person memories the character retains |
| BM25 Rebuild | Every 15 min | Rebuilds the sparse keyword search index as new memories arrive |
| Memory Consolidation | Every 60 min | Applies Ebbinghaus decay, HDBSCAN clustering, merges duplicates, prunes faded memories |
| Periodic Research | Every 6 hours | Re-scrapes research seeds, explores deeper, generates new memories |
All intervals are configurable in config/settings.yaml under the background: section.
In the CLI, background tasks run alongside the chat loop using asyncio (input is handled in a thread executor so the event loop stays free). In the Discord bot and API server, they run naturally within the async event loop. Research loops can be enabled/disabled per character via the research_enabled YAML field or the PUT /characters/{slug}/research API endpoint.
Each character has a NetworkX directed graph that tracks entities, relationships, and facts:
- Nodes: people, places, organizations, events, dates, concepts
- Edges: relationships (
friend_of,lives_at,happened_in,knows_of, etc.) - Serialized as GraphML files per character
The graph is populated from:
- Memory bootstrapping — spaCy NER extracts entities from the generated biography
- Research synthesis — the LLM extracts structured
SUBJECT | PREDICATE | OBJECTfacts from scraped content - Conversation tracking — new entities mentioned in conversation are added automatically
The graph enables:
- Contradiction detection — "Does this response contradict an established fact?" is a graph traversal, not an LLM call
- Contextual retrieval — when a user mentions "Watson", the graph finds all related entities and pulls memories about them
- Consistency validation — ensures the character never states something that conflicts with their established knowledge
Each user gets a dedicated Relationship model that tracks:
| Dimension | Range | Description |
|---|---|---|
familiarity |
0–1 | How well the character knows this person |
trust |
0–1 | How much the character trusts them |
affection |
0–1 | How much the character likes them |
respect |
0–1 | How much the character respects them |
Plus: interaction_count, character_notes (impressions), topics_discussed, timestamps.
Relationships evolve gradually. After each exchange, sentiment analysis from the conversation adjusts the dimensions. The relationship context is injected into the prompt as a first-person description:
Watson and I have spoken many times. I trust him deeply and consider him a friend.
We have discussed the Baskerville case and Mrs. Hudson's complaints.
The character knows what time it is — mapped to their era:
| Format | Example Output | When Used |
|---|---|---|
modern |
"2:30 pm" | Default for present-day characters |
24h |
"0315 hours" | Military, aviation, etc. |
victorian |
"a quarter past 8 in the evening" | 19th century characters |
medieval |
"midday" / "evening, the sun lowering" | Pre-modern characters |
narrative |
"the dead of night" | Dreamlike, fantastical characters |
Time format is driven by the time_format field in the character YAML. If not set, it's inferred from the era field.
Characters also have time_behaviors — descriptions of what they'd be doing at different times of day. These are injected directly into the prompt:
time_behaviors:
early_morning: "I'm on guard duty or just got off. Running on caffeine and spite."
afternoon: "Hottest part of the day. Everyone's hiding from the sun."
evening: "Best time. It cools down, everyone's hanging out, smoking and talking."Every response passes through a 4-signal validation pipeline before being returned:
-
Voice TF-IDF — Builds a vocabulary profile from the character's memories and example quotes. Scores each response for vocabulary match. Catches responses that sound generic rather than character-specific.
-
Identity Anchor — Scans for AI-speak tells: "certainly", "I understand your perspective", "great question", bullet-point formatting, numbered lists, excessive formality. These are dead giveaways that the character has broken.
-
NER Anachronism Check — spaCy extracts entities from the response. Any entity that shouldn't exist in the character's era (e.g., "iPhone" in a Victorian setting) is flagged.
-
Knowledge Graph Contradiction — Checks if the response states anything that conflicts with established facts in the character's graph.
If the combined severity exceeds the threshold, the response is regenerated with explicit correction constraints injected into the prompt. Maximum 2 retries.
The engine abstracts LLM access behind a Protocol-based interface. Four backends are available, all with native streaming support:
| Backend | Class | How It Works |
|---|---|---|
anthropic |
AnthropicBackend |
Anthropic Python SDK — Claude models |
openai |
OpenAIBackend |
OpenAI Python SDK — GPT models |
openrouter |
OpenRouterBackend |
OpenAI-compatible API — access to many providers |
ollama |
OllamaBackend |
HTTP calls to local Ollama server |
The default backend is openrouter. Each character can override the backend and model in their YAML:
backend: anthropic
model: claude-sonnet-4-5-20250929
temperature: 0.9Characters are defined in YAML with two modes:
Provide a few fields and let the bootstrapper generate everything else:
name: Charlie Parker
is_seed: true
seed_prompt: "jazz alto saxophonist, 1940s-50s New York City, bebop pioneer"
era: 1940sThe MemoryBootstrapper will generate a full biography, extract entities into the knowledge graph, and create 25+ layered memories (formative, defining moments, personal, sensory, knowledge).
Precise control over every aspect:
name: Jake Torres
era: modern, 2012
time_format: 24h
role: US Army infantry, E-4 Specialist
setting: >
Deployed to FOB Shank, Logar Province, Afghanistan...
core_identity: >
I'm Jake Torres, Specialist, US Army. 2nd Platoon, Bravo Company...
backstory: >
Did basic at Fort Benning, summer 2011...
personality_traits:
- loyal to his squad above all else
- dark humor as a coping mechanism
speech_patterns:
formality: casual
verbosity: medium
text_style: texting
dialect: "Southwest US, military slang heavy"
catchphrases: ["it is what it is", "roger that"]
vocabulary_avoidances: ["certainly", "indeed", "fascinating"]
filler_words: ["like", "man", "dude"]
knowledge_domains:
- US Army infantry tactics
- Afghanistan geography
knowledge_boundaries:
- I don't know anything about the big strategic picture
- I have no idea what's happening back in the US
research_enabled: true # toggle all research for this character
research_seeds:
- "10th Mountain Division Afghanistan 2012"
- "FOB Shank Logar Province"
time_behaviors:
early_morning: "on guard duty, running on rip-its and spite"
evening: "best time, everyone hanging out, smoking and talking"
baseline_emotions:
joy: 0.2
trust: 0.5
anger: 0.4
sadness: 0.4See config/characters/original_template.yaml for the full field reference.
lettuce-engine/
├── pyproject.toml # Build config, all dependencies
├── config/
│ ├── settings.yaml # Global engine settings
│ └── characters/
│ ├── sam_thompson.yaml # Example: Special Forces soldier
│ ├── sherlock_holmes.yaml # Example: Victorian detective
│ ├── soldier_example.yaml # Example: modern soldier (24h time)
│ └── original_template.yaml # Field reference template
│
├── src/lettuce/
│ ├── engine.py # Core orchestrator
│ ├── cli.py # CLI entry point (chat, stats, discord, serve)
│ │
│ ├── api/
│ │ ├── app.py # FastAPI app factory, lifespan, setup gate
│ │ ├── config_manager.py # YAML config read/write with dotpath access
│ │ ├── engine_manager.py # Multi-character engine cache
│ │ ├── auth.py # Bearer token authentication
│ │ ├── schemas.py # Pydantic request/response models
│ │ ├── routes_setup.py # Setup flow + config CRUD
│ │ ├── routes_characters.py # Character listing/loading/research toggle
│ │ ├── routes_chat.py # REST chat + WebSocket streaming + history
│ │ └── routes_status.py # Health, system dashboard, user data deletion
│ │
│ ├── identity/
│ │ ├── character.py # Character dataclass + YAML loader
│ │ ├── prompt_assembler.py # 9-section prompt builder
│ │ ├── time_awareness.py # Era-mapped time perception
│ │ ├── identity_anchor.py # AI-speak drift detection
│ │ └── memory_bootstrapper.py # Seed → full person generator
│ │
│ ├── llm/
│ │ ├── backend.py # LLMBackend protocol + data models
│ │ ├── router.py # Backend factory
│ │ ├── anthropic.py # Claude API (streaming)
│ │ ├── openai.py # GPT API (streaming)
│ │ ├── openrouter.py # OpenRouter API (streaming)
│ │ └── ollama.py # Local models (streaming)
│ │
│ ├── memory/
│ │ ├── models.py # Memory + ConversationTurn dataclasses
│ │ ├── embedder.py # sentence-transformers wrapper
│ │ ├── vector_store.py # ChromaDB wrapper
│ │ ├── sqlite_store.py # SQLite relational store
│ │ ├── retriever.py # Hybrid retrieval (dense + BM25 + graph)
│ │ ├── generator.py # LLM memory synthesis
│ │ └── consolidator.py # Decay, clustering, pruning
│ │
│ ├── nlp/
│ │ ├── pipeline.py # spaCy NER + analysis
│ │ ├── voice_analyzer.py # TF-IDF voice profiling
│ │ ├── emotion_classifier.py # DistilRoBERTa classification
│ │ └── entity_tracker.py # Cross-conversation entity tracking
│ │
│ ├── knowledge/
│ │ ├── graph.py # NetworkX knowledge graph
│ │ ├── fact_store.py # Structured facts + contradiction check
│ │ └── contradiction_detector.py # Graph + embedding contradiction detection
│ │
│ ├── emotion/
│ │ ├── state.py # 8D Plutchik vector model
│ │ ├── engine.py # ML classifier + vector math
│ │ └── decay.py # Exponential decay toward baseline
│ │
│ ├── relationships/
│ │ ├── models.py # Per-user relationship model
│ │ └── tracker.py # Gradual relationship evolution
│ │
│ ├── research/
│ │ ├── loop.py # Background research orchestrator
│ │ ├── synthesizer.py # LLM fact extraction
│ │ └── scrapers/ # Wikipedia, Fandom, general web
│ │
│ ├── consistency/
│ │ ├── validator.py # 4-signal validation pipeline
│ │ └── contradiction_resolver.py # Constraint builder for retries
│ │
│ ├── discord_bot/
│ │ ├── bot.py # Bot lifecycle
│ │ └── cogs/ # Conversation, admin, character mgmt
│ │
│ └── db/
│ ├── connection.py # Async SQLite connection
│ └── migrations.py # Schema setup
│
└── tests/ # 33 tests across 6 files
- Python 3.11+
- One of: Anthropic API key, OpenAI API key, OpenRouter API key, or Ollama running locally
git clone <repo-url> lettuce-engine
cd lettuce-engine
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"python -m spacy download en_core_web_smThe sentence-transformer model (all-MiniLM-L6-v2) and emotion classifier (j-hartmann/emotion-english-distilroberta-base) are downloaded automatically on first use.
cp .env.example .env
# Edit .env with your API keysRequired keys depend on which backend you use:
anthropic—ANTHROPIC_API_KEYopenai—OPENAI_API_KEYopenrouter—OPENROUTER_API_KEY(or setapi_keyin settings underllm.openrouter)ollama— No API key needed (local server)- Discord bot —
DISCORD_TOKEN - API auth —
LETTUCE_API_KEY(optional, empty = auth disabled)
# Chat with a bundled character
lettuce chat --character sherlock_holmes
# Chat with a custom character file
lettuce chat --character-file my_character.yaml
# Specify a settings file
lettuce chat --character soldier_example --settings config/settings.yamlInside the chat:
- Type messages normally to talk to the character
/stats— show engine statistics (memory count, emotion state, graph size, etc.)quitorexit— leave
Background loops (memory synthesis, consolidation, research) run automatically while you chat.
lettuce discord --character-file config/characters/sherlock_holmes.yamlThe bot responds to @mentions and DMs. It simulates typing delay proportional to response length. Admin commands and character info are available via cogs.
lettuce serve # 0.0.0.0:8000
lettuce serve --port 3000 # custom port
lettuce serve --reload # dev mode with auto-reloadThe API supports multi-character management, streaming chat via WebSocket, runtime configuration, conversation history retrieval, and a gated setup flow for first-run configuration. See API.md for full endpoint documentation.
lettuce stats --character sherlock_holmesfrom lettuce.engine import LettuceEngine, load_settings
from lettuce.identity.character import Character
character = Character.from_yaml("config/characters/sherlock_holmes.yaml")
settings = load_settings()
engine = LettuceEngine(character, settings)
await engine.setup()
await engine.start_background_loops()
response = await engine.respond(
"What do you make of this footprint, Holmes?",
user_id="watson",
user_name="Watson",
)
print(response)
await engine.stop_background_loops()All settings live in config/settings.yaml:
engine:
default_backend: openrouter # anthropic, openai, openrouter, ollama
data_dir: ./data
llm:
openrouter:
model: anthropic/claude-sonnet-4-5-20250929
api_key: sk-or-... # or set OPENROUTER_API_KEY env var
max_tokens: 4096
anthropic:
model: claude-sonnet-4-5-20250929
temperature: 0.9
openai:
model: gpt-4o
ollama:
model: llama3.1
base_url: http://localhost:11434
memory:
embedding_model: all-MiniLM-L6-v2
max_retrieval_results: 15
dense_weight: 0.5 # ChromaDB semantic search weight
bm25_weight: 0.3 # Keyword search weight
graph_weight: 0.2 # Knowledge graph weight
emotion:
decay_rate: 0.1
decay_interval_minutes: 30
background:
synthesis_interval_minutes: 10 # Conversation → memory synthesis
consolidation_interval_minutes: 60
bm25_rebuild_interval_minutes: 15
drip_research_interval_minutes: 60
api:
# api_key: "" # Set via LETTUCE_API_KEY env var or here| Component | Library | Purpose |
|---|---|---|
| Embeddings | sentence-transformers |
Memory vectorization (all-MiniLM-L6-v2) |
| NER | spaCy (en_core_web_sm) |
Entity extraction from conversations and research |
| Emotion | transformers |
DistilRoBERTa emotion classification |
| Voice | scikit-learn |
TF-IDF vocabulary profiling |
| Clustering | scikit-learn |
HDBSCAN memory deduplication |
| Sparse search | rank-bm25 |
BM25 keyword retrieval |
| Emotion math | numpy |
Plutchik vector operations, decay curves |
| Component | Library | Purpose |
|---|---|---|
| Vector DB | chromadb |
Semantic memory search |
| Relational | aiosqlite |
Conversation turns, relationships, metadata |
| Knowledge | networkx |
Entity graph (serialized as GraphML) |
| Backend | Library | Purpose |
|---|---|---|
| Anthropic | anthropic |
Claude models (streaming) |
| OpenAI | openai |
GPT models (streaming) |
| OpenRouter | openai |
Multi-provider access (streaming) |
| Ollama | aiohttp |
Local model server (streaming) |
| Component | Library | Purpose |
|---|---|---|
| REST API | fastapi + uvicorn |
HTTP + WebSocket server |
| Discord | discord.py |
Bot framework |
| CLI | click |
Command-line interface |
| Config | pyyaml + pydantic |
Settings and character parsing |
| Logging | structlog |
Structured logging |
| Web scraping | trafilatura + beautifulsoup4 |
Research content extraction |
# Run all tests
pytest
# Run with verbose output
pytest -v
# Run a specific test file
pytest tests/test_emotion.py33 tests cover: character loading, emotion math, knowledge graph operations, prompt assembly, relationship tracking, and time awareness formatting.
ML-first, LLM-second. Use local ML models (classifiers, embeddings, NER, TF-IDF) for fast, cheap, deterministic operations. Reserve LLM calls for creative tasks — memory generation, response generation, complex emotional shifts. This cuts LLM API costs by ~70%.
Hybrid retrieval over pure vector search. Dense embeddings catch semantic similarity but miss exact keywords. BM25 catches keywords but misses paraphrases. The knowledge graph catches entity relationships that neither can find. RRF combines all three.
Vector emotions over string emotions. 8D Plutchik vectors allow instant mathematical operations (decay, blend, amplify). No API call needed to answer "how does the character feel right now?"
Graph-based consistency. "Does Holmes know about computers?" is a graph lookup — instant. Not a 2-second LLM call.
Forgetting is a feature. The Ebbinghaus decay curve means characters naturally forget unimportant things while retaining what matters. Frequently accessed memories strengthen. This mimics how human memory actually works.





