Lettuce Engine

Hyperrealistic AI character engine — identity, not roleplay.

Lettuce Engine makes LLMs be characters instead of playing them. There is no "act as" instruction, no roleplay framing. The entire system prompt is written as the character's own internal awareness — first-person declarative identity. The LLM receives a context that is the character's mental state: their memories, emotions, relationships, knowledge, and moment-to-moment experience of time.

The engine runs background loops that autonomously research characters via the web, generate layered memories, track emotional state across conversations, evolve per-user relationships, and validate every response for consistency. It works with any character — fictional, historical, or original — across multiple LLM backends.

Architecture Overview

The engine is organized into 10 subsystems that wire together through a central orchestrator (LettuceEngine). Entry points include a terminal CLI, a Discord bot, a REST + WebSocket API, and direct programmatic access.

Core Philosophy

Traditional AI roleplay tells the model "pretend to be X." Lettuce Engine never does this. Instead, the system prompt is a declarative identity document written entirely in first person:

I am Sherlock Holmes. I reside at 221B Baker Street, London.
I recall the curious case of the Speckled Band...
At this moment, I feel intellectually restless.
I am speaking with James. We have conversed three times previously.

No meta-layer. No "you are an AI assistant." The LLM simply receives a context that IS the character's inner world.

The system prompt is assembled dynamically per-request from 9 sections, each populated by a different subsystem (memory retrieval, emotion engine, relationship tracker, time awareness, etc.).

How a Message Flows

Every user message passes through a 6-stage pipeline:

Analyze — spaCy extracts entities and topics. DistilRoBERTa classifies the emotional content. Time-based emotion decay is applied.
Retrieve Context — Three retrieval signals run in parallel:
- Dense search (ChromaDB) — semantic similarity via sentence-transformer embeddings
- BM25 search — keyword matching over the full memory corpus
- Knowledge graph traversal — entities in the message are looked up in the character's NetworkX graph, and related memories are pulled via BFS
Results are merged via Reciprocal Rank Fusion (RRF) with configurable weights, then boosted by recency. A 5% random surfacing chance injects unexpected memories for naturalism.
Build Prompt — The PromptAssembler constructs the 9-section first-person system prompt from the character definition, retrieved memories, current emotional state, relationship context, and time awareness.
Generate — The assembled prompt + conversation history is sent to the configured LLM backend (Anthropic, OpenAI, OpenRouter, or Ollama). Streaming is supported via WebSocket.
Validate — A 4-signal consistency pipeline checks the response:
- Voice TF-IDF — is the vocabulary within the character's speech profile?
- Identity Anchor — any AI-speak tells? ("certainly", "great question", bullet points)
- NER Anachronism — entities from the wrong era?
- Knowledge Graph — contradictions with established facts?
If validation fails, the response is regenerated with explicit constraints (up to 2 retries).
Post-Response Updates — Conversation turns are stored in SQLite + ChromaDB. Entities are tracked. Relationship dimensions (trust, affection, respect) are updated based on exchange sentiment. The character's own emotional state updates from its response.

Memory System

Memory is the most complex subsystem. Characters don't just retrieve static facts, they have a living, evolving memory that decays, consolidates, and grows over time.

Dual Storage

Every memory exists in two places:

ChromaDB — vector embeddings for semantic search (all-MiniLM-L6-v2)
SQLite — metadata, importance scores, access counts, timestamps, tags

Memory Types

Type	Description	Source
`episodic`	Specific events and experiences	Bootstrapper, conversations, research
`semantic`	Facts, knowledge, skills	Research, bootstrapper
`emotional`	Emotional reactions and feelings	Bootstrapper, conversation synthesis
`conversation`	Retained impressions from chats	Conversation synthesis loop

Hybrid Retrieval

The HybridRetriever combines three signals using Reciprocal Rank Fusion:

Signal	Weight	What It Catches
Dense (ChromaDB)	0.5	Semantic similarity, paraphrases
BM25 (sparse)	0.3	Exact keyword matches that embeddings miss
Graph (NetworkX)	0.2	Entity relationships invisible to both

Memory Lifecycle

Creation — Memories are born from bootstrapping, conversation synthesis, or research
Retrieval — Accessed memories get their access_count incremented and last_accessed updated
Decay — Ebbinghaus forgetting curve: importance *= e^(-lambda * hours). Frequently accessed memories decay slower (spaced repetition effect)
Consolidation — HDBSCAN clusters similar memories; low-importance duplicates are merged into the strongest memory
Pruning — Memories that decay below the minimum threshold (0.05) are permanently deleted

Conversation-to-Memory Synthesis

Raw conversation turns are not directly useful as memories. Every 10 minutes, the synthesis loop:

Fetches unsynthesized conversation turns since the last run
Groups them by user
Sends each conversation to the LLM with a synthesis prompt: "What do I actually remember from this conversation?"
The LLM produces memories the way a real person would retain them — impressions, key facts, emotional reactions, not a verbatim log
If nothing memorable happened, it outputs nothing (no memory pollution)

Emotion Engine

Emotions are 8-dimensional vectors based on Plutchik's wheel of emotions, not LLM-guessed strings.

Dimensions

Dimension	Range	Valence
Joy	0–1	Positive
Trust	0–1	Positive
Fear	0–1	Negative
Surprise	0–1	Neutral
Sadness	0–1	Negative
Disgust	0–1	Negative
Anger	0–1	Negative
Anticipation	0–1	Positive

How It Works

DistilRoBERTa (j-hartmann/emotion-english-distilroberta-base) classifies the emotion in each user message locally — no LLM call
Classifier output is mapped to Plutchik dimensions
Personality weights from the character's traits modify the response (an angry character amplifies anger signals)
For simple shifts: vector math (blend, amplify) computes the new state instantly
For complex shifts (betrayal, major revelations): LLM fallback for nuanced emotional reasoning
Over time: exponential decay pulls emotions back toward the character's baseline

This approach handles ~90% of emotional updates with pure math — no API calls, no latency, deterministic results.

Output

The emotional state produces human-readable descriptions for the prompt:

I feel a strong sense of anticipation mixed with mild unease.

It also exposes computed properties: primary_emotion, secondary_emotion, intensity (L2 norm), and valence (-1 to +1).

Background Loops

The engine runs several autonomous background tasks that keep the character alive between conversations:

Loop	Default Interval	What It Does
Initial Research	Once on boot	Scrapes all `research_seeds` (Wikipedia, Fandom, web), extracts facts into knowledge graph, generates episodic memories
Conversation Synthesis	Every 10 min	Converts recent conversation turns into first-person memories the character retains
BM25 Rebuild	Every 15 min	Rebuilds the sparse keyword search index as new memories arrive
Memory Consolidation	Every 60 min	Applies Ebbinghaus decay, HDBSCAN clustering, merges duplicates, prunes faded memories
Periodic Research	Every 6 hours	Re-scrapes research seeds, explores deeper, generates new memories

All intervals are configurable in config/settings.yaml under the background: section.

In the CLI, background tasks run alongside the chat loop using asyncio (input is handled in a thread executor so the event loop stays free). In the Discord bot and API server, they run naturally within the async event loop. Research loops can be enabled/disabled per character via the research_enabled YAML field or the PUT /characters/{slug}/research API endpoint.

Knowledge Graph

Each character has a NetworkX directed graph that tracks entities, relationships, and facts:

Nodes: people, places, organizations, events, dates, concepts
Edges: relationships (friend_of, lives_at, happened_in, knows_of, etc.)
Serialized as GraphML files per character

The graph is populated from:

Memory bootstrapping — spaCy NER extracts entities from the generated biography
Research synthesis — the LLM extracts structured SUBJECT | PREDICATE | OBJECT facts from scraped content
Conversation tracking — new entities mentioned in conversation are added automatically

The graph enables:

Contradiction detection — "Does this response contradict an established fact?" is a graph traversal, not an LLM call
Contextual retrieval — when a user mentions "Watson", the graph finds all related entities and pulls memories about them
Consistency validation — ensures the character never states something that conflicts with their established knowledge

Relationship System

Each user gets a dedicated Relationship model that tracks:

Dimension	Range	Description
`familiarity`	0–1	How well the character knows this person
`trust`	0–1	How much the character trusts them
`affection`	0–1	How much the character likes them
`respect`	0–1	How much the character respects them

Plus: interaction_count, character_notes (impressions), topics_discussed, timestamps.

Relationships evolve gradually. After each exchange, sentiment analysis from the conversation adjusts the dimensions. The relationship context is injected into the prompt as a first-person description:

Watson and I have spoken many times. I trust him deeply and consider him a friend.
We have discussed the Baskerville case and Mrs. Hudson's complaints.

Time Awareness

The character knows what time it is — mapped to their era:

Format	Example Output	When Used
`modern`	"2:30 pm"	Default for present-day characters
`24h`	"0315 hours"	Military, aviation, etc.
`victorian`	"a quarter past 8 in the evening"	19th century characters
`medieval`	"midday" / "evening, the sun lowering"	Pre-modern characters
`narrative`	"the dead of night"	Dreamlike, fantastical characters

Time format is driven by the time_format field in the character YAML. If not set, it's inferred from the era field.

Characters also have time_behaviors — descriptions of what they'd be doing at different times of day. These are injected directly into the prompt:

time_behaviors:
  early_morning: "I'm on guard duty or just got off. Running on caffeine and spite."
  afternoon: "Hottest part of the day. Everyone's hiding from the sun."
  evening: "Best time. It cools down, everyone's hanging out, smoking and talking."

Consistency Validation

Every response passes through a 4-signal validation pipeline before being returned:

Voice TF-IDF — Builds a vocabulary profile from the character's memories and example quotes. Scores each response for vocabulary match. Catches responses that sound generic rather than character-specific.
Identity Anchor — Scans for AI-speak tells: "certainly", "I understand your perspective", "great question", bullet-point formatting, numbered lists, excessive formality. These are dead giveaways that the character has broken.
NER Anachronism Check — spaCy extracts entities from the response. Any entity that shouldn't exist in the character's era (e.g., "iPhone" in a Victorian setting) is flagged.
Knowledge Graph Contradiction — Checks if the response states anything that conflicts with established facts in the character's graph.

If the combined severity exceeds the threshold, the response is regenerated with explicit correction constraints injected into the prompt. Maximum 2 retries.

LLM Backends

The engine abstracts LLM access behind a Protocol-based interface. Four backends are available, all with native streaming support:

Backend	Class	How It Works
`anthropic`	`AnthropicBackend`	Anthropic Python SDK — Claude models
`openai`	`OpenAIBackend`	OpenAI Python SDK — GPT models
`openrouter`	`OpenRouterBackend`	OpenAI-compatible API — access to many providers
`ollama`	`OllamaBackend`	HTTP calls to local Ollama server

The default backend is openrouter. Each character can override the backend and model in their YAML:

backend: anthropic
model: claude-sonnet-4-5-20250929
temperature: 0.9

Character Definition

Characters are defined in YAML with two modes:

Minimal Seed Mode

Provide a few fields and let the bootstrapper generate everything else:

name: Charlie Parker
is_seed: true
seed_prompt: "jazz alto saxophonist, 1940s-50s New York City, bebop pioneer"
era: 1940s

The MemoryBootstrapper will generate a full biography, extract entities into the knowledge graph, and create 25+ layered memories (formative, defining moments, personal, sensory, knowledge).

Full Definition Mode

Precise control over every aspect:

name: Jake Torres
era: modern, 2012
time_format: 24h
role: US Army infantry, E-4 Specialist
setting: >
  Deployed to FOB Shank, Logar Province, Afghanistan...

core_identity: >
  I'm Jake Torres, Specialist, US Army. 2nd Platoon, Bravo Company...

backstory: >
  Did basic at Fort Benning, summer 2011...

personality_traits:
  - loyal to his squad above all else
  - dark humor as a coping mechanism

speech_patterns:
  formality: casual
  verbosity: medium
  text_style: texting
  dialect: "Southwest US, military slang heavy"
  catchphrases: ["it is what it is", "roger that"]
  vocabulary_avoidances: ["certainly", "indeed", "fascinating"]
  filler_words: ["like", "man", "dude"]

knowledge_domains:
  - US Army infantry tactics
  - Afghanistan geography

knowledge_boundaries:
  - I don't know anything about the big strategic picture
  - I have no idea what's happening back in the US

research_enabled: true              # toggle all research for this character
research_seeds:
  - "10th Mountain Division Afghanistan 2012"
  - "FOB Shank Logar Province"

time_behaviors:
  early_morning: "on guard duty, running on rip-its and spite"
  evening: "best time, everyone hanging out, smoking and talking"

baseline_emotions:
  joy: 0.2
  trust: 0.5
  anger: 0.4
  sadness: 0.4

See config/characters/original_template.yaml for the full field reference.

Project Structure

lettuce-engine/
├── pyproject.toml                         # Build config, all dependencies
├── config/
│   ├── settings.yaml                      # Global engine settings
│   └── characters/
│       ├── sam_thompson.yaml              # Example: Special Forces soldier
│       ├── sherlock_holmes.yaml            # Example: Victorian detective
│       ├── soldier_example.yaml           # Example: modern soldier (24h time)
│       └── original_template.yaml         # Field reference template
│
├── src/lettuce/
│   ├── engine.py                          # Core orchestrator
│   ├── cli.py                             # CLI entry point (chat, stats, discord, serve)
│   │
│   ├── api/
│   │   ├── app.py                         # FastAPI app factory, lifespan, setup gate
│   │   ├── config_manager.py              # YAML config read/write with dotpath access
│   │   ├── engine_manager.py              # Multi-character engine cache
│   │   ├── auth.py                        # Bearer token authentication
│   │   ├── schemas.py                     # Pydantic request/response models
│   │   ├── routes_setup.py                # Setup flow + config CRUD
│   │   ├── routes_characters.py           # Character listing/loading/research toggle
│   │   ├── routes_chat.py                 # REST chat + WebSocket streaming + history
│   │   └── routes_status.py               # Health, system dashboard, user data deletion
│   │
│   ├── identity/
│   │   ├── character.py                   # Character dataclass + YAML loader
│   │   ├── prompt_assembler.py            # 9-section prompt builder
│   │   ├── time_awareness.py              # Era-mapped time perception
│   │   ├── identity_anchor.py             # AI-speak drift detection
│   │   └── memory_bootstrapper.py         # Seed → full person generator
│   │
│   ├── llm/
│   │   ├── backend.py                     # LLMBackend protocol + data models
│   │   ├── router.py                      # Backend factory
│   │   ├── anthropic.py                   # Claude API (streaming)
│   │   ├── openai.py                      # GPT API (streaming)
│   │   ├── openrouter.py                  # OpenRouter API (streaming)
│   │   └── ollama.py                      # Local models (streaming)
│   │
│   ├── memory/
│   │   ├── models.py                      # Memory + ConversationTurn dataclasses
│   │   ├── embedder.py                    # sentence-transformers wrapper
│   │   ├── vector_store.py                # ChromaDB wrapper
│   │   ├── sqlite_store.py                # SQLite relational store
│   │   ├── retriever.py                   # Hybrid retrieval (dense + BM25 + graph)
│   │   ├── generator.py                   # LLM memory synthesis
│   │   └── consolidator.py                # Decay, clustering, pruning
│   │
│   ├── nlp/
│   │   ├── pipeline.py                    # spaCy NER + analysis
│   │   ├── voice_analyzer.py              # TF-IDF voice profiling
│   │   ├── emotion_classifier.py          # DistilRoBERTa classification
│   │   └── entity_tracker.py              # Cross-conversation entity tracking
│   │
│   ├── knowledge/
│   │   ├── graph.py                       # NetworkX knowledge graph
│   │   ├── fact_store.py                  # Structured facts + contradiction check
│   │   └── contradiction_detector.py      # Graph + embedding contradiction detection
│   │
│   ├── emotion/
│   │   ├── state.py                       # 8D Plutchik vector model
│   │   ├── engine.py                      # ML classifier + vector math
│   │   └── decay.py                       # Exponential decay toward baseline
│   │
│   ├── relationships/
│   │   ├── models.py                      # Per-user relationship model
│   │   └── tracker.py                     # Gradual relationship evolution
│   │
│   ├── research/
│   │   ├── loop.py                        # Background research orchestrator
│   │   ├── synthesizer.py                 # LLM fact extraction
│   │   └── scrapers/                      # Wikipedia, Fandom, general web
│   │
│   ├── consistency/
│   │   ├── validator.py                   # 4-signal validation pipeline
│   │   └── contradiction_resolver.py      # Constraint builder for retries
│   │
│   ├── discord_bot/
│   │   ├── bot.py                         # Bot lifecycle
│   │   └── cogs/                          # Conversation, admin, character mgmt
│   │
│   └── db/
│       ├── connection.py                  # Async SQLite connection
│       └── migrations.py                  # Schema setup
│
└── tests/                                 # 33 tests across 6 files

Setup

Prerequisites

Python 3.11+
One of: Anthropic API key, OpenAI API key, OpenRouter API key, or Ollama running locally

Install

git clone <repo-url> lettuce-engine
cd lettuce-engine
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Download NLP Models

python -m spacy download en_core_web_sm

The sentence-transformer model (all-MiniLM-L6-v2) and emotion classifier (j-hartmann/emotion-english-distilroberta-base) are downloaded automatically on first use.

Environment

cp .env.example .env
# Edit .env with your API keys

Required keys depend on which backend you use:

anthropic — ANTHROPIC_API_KEY
openai — OPENAI_API_KEY
openrouter — OPENROUTER_API_KEY (or set api_key in settings under llm.openrouter)
ollama — No API key needed (local server)
Discord bot — DISCORD_TOKEN
API auth — LETTUCE_API_KEY (optional, empty = auth disabled)

Usage

Terminal Chat

# Chat with a bundled character
lettuce chat --character sherlock_holmes

# Chat with a custom character file
lettuce chat --character-file my_character.yaml

# Specify a settings file
lettuce chat --character soldier_example --settings config/settings.yaml

Inside the chat:

Type messages normally to talk to the character
/stats — show engine statistics (memory count, emotion state, graph size, etc.)
quit or exit — leave

Background loops (memory synthesis, consolidation, research) run automatically while you chat.

Discord Bot

lettuce discord --character-file config/characters/sherlock_holmes.yaml

The bot responds to @mentions and DMs. It simulates typing delay proportional to response length. Admin commands and character info are available via cogs.

REST + WebSocket API

lettuce serve                     # 0.0.0.0:8000
lettuce serve --port 3000         # custom port
lettuce serve --reload            # dev mode with auto-reload

The API supports multi-character management, streaming chat via WebSocket, runtime configuration, conversation history retrieval, and a gated setup flow for first-run configuration. See API.md for full endpoint documentation.

Engine Statistics

lettuce stats --character sherlock_holmes

Programmatic API

from lettuce.engine import LettuceEngine, load_settings
from lettuce.identity.character import Character

character = Character.from_yaml("config/characters/sherlock_holmes.yaml")
settings = load_settings()

engine = LettuceEngine(character, settings)
await engine.setup()
await engine.start_background_loops()

response = await engine.respond(
    "What do you make of this footprint, Holmes?",
    user_id="watson",
    user_name="Watson",
)
print(response)

await engine.stop_background_loops()

Configuration

All settings live in config/settings.yaml:

engine:
  default_backend: openrouter       # anthropic, openai, openrouter, ollama
  data_dir: ./data

llm:
  openrouter:
    model: anthropic/claude-sonnet-4-5-20250929
    api_key: sk-or-...              # or set OPENROUTER_API_KEY env var
    max_tokens: 4096
  anthropic:
    model: claude-sonnet-4-5-20250929
    temperature: 0.9
  openai:
    model: gpt-4o
  ollama:
    model: llama3.1
    base_url: http://localhost:11434

memory:
  embedding_model: all-MiniLM-L6-v2
  max_retrieval_results: 15
  dense_weight: 0.5                 # ChromaDB semantic search weight
  bm25_weight: 0.3                  # Keyword search weight
  graph_weight: 0.2                 # Knowledge graph weight

emotion:
  decay_rate: 0.1
  decay_interval_minutes: 30

background:
  synthesis_interval_minutes: 10    # Conversation → memory synthesis
  consolidation_interval_minutes: 60
  bm25_rebuild_interval_minutes: 15
  drip_research_interval_minutes: 60

api:
  # api_key: ""                     # Set via LETTUCE_API_KEY env var or here

Technology Stack

ML / AI (all local, no API calls)

Component	Library	Purpose
Embeddings	`sentence-transformers`	Memory vectorization (`all-MiniLM-L6-v2`)
NER	`spaCy` (`en_core_web_sm`)	Entity extraction from conversations and research
Emotion	`transformers`	DistilRoBERTa emotion classification
Voice	`scikit-learn`	TF-IDF vocabulary profiling
Clustering	`scikit-learn`	HDBSCAN memory deduplication
Sparse search	`rank-bm25`	BM25 keyword retrieval
Emotion math	`numpy`	Plutchik vector operations, decay curves

Storage

Component	Library	Purpose
Vector DB	`chromadb`	Semantic memory search
Relational	`aiosqlite`	Conversation turns, relationships, metadata
Knowledge	`networkx`	Entity graph (serialized as GraphML)

LLM Backends

Backend	Library	Purpose
Anthropic	`anthropic`	Claude models (streaming)
OpenAI	`openai`	GPT models (streaming)
OpenRouter	`openai`	Multi-provider access (streaming)
Ollama	`aiohttp`	Local model server (streaming)

Infrastructure

Component	Library	Purpose
REST API	`fastapi` + `uvicorn`	HTTP + WebSocket server
Discord	`discord.py`	Bot framework
CLI	`click`	Command-line interface
Config	`pyyaml` + `pydantic`	Settings and character parsing
Logging	`structlog`	Structured logging
Web scraping	`trafilatura` + `beautifulsoup4`	Research content extraction

Testing

# Run all tests
pytest

# Run with verbose output
pytest -v

# Run a specific test file
pytest tests/test_emotion.py

33 tests cover: character loading, emotion math, knowledge graph operations, prompt assembly, relationship tracking, and time awareness formatting.

Design Principles

ML-first, LLM-second. Use local ML models (classifiers, embeddings, NER, TF-IDF) for fast, cheap, deterministic operations. Reserve LLM calls for creative tasks — memory generation, response generation, complex emotional shifts. This cuts LLM API costs by ~70%.

Hybrid retrieval over pure vector search. Dense embeddings catch semantic similarity but miss exact keywords. BM25 catches keywords but misses paraphrases. The knowledge graph catches entity relationships that neither can find. RRF combines all three.

Vector emotions over string emotions. 8D Plutchik vectors allow instant mathematical operations (decay, blend, amplify). No API call needed to answer "how does the character feel right now?"

Graph-based consistency. "Does Holmes know about computers?" is a graph lookup — instant. Not a 2-second LLM call.

Forgetting is a feature. The Ebbinghaus decay curve means characters naturally forget unimportant things while retaining what matters. Frequently accessed memories strengthen. This mimics how human memory actually works.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
config		config
docs/diagrams		docs/diagrams
src/lettuce		src/lettuce
tests		tests
.env.example		.env.example
.gitignore		.gitignore
API.md		API.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
ETHICS.md		ETHICS.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Lettuce Engine

Architecture Overview

Core Philosophy

How a Message Flows

Memory System

Dual Storage

Memory Types

Hybrid Retrieval

Memory Lifecycle

Conversation-to-Memory Synthesis

Emotion Engine

Dimensions

How It Works

Output

Background Loops

Knowledge Graph

Relationship System

Time Awareness

Consistency Validation

LLM Backends

Character Definition

Minimal Seed Mode

Full Definition Mode

Project Structure

Setup

Prerequisites

Install

Download NLP Models

Environment

Usage

Terminal Chat

Discord Bot

REST + WebSocket API

Engine Statistics

Programmatic API

Configuration

Technology Stack

ML / AI (all local, no API calls)

Storage

LLM Backends

Infrastructure

Testing

Design Principles

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages