Skip to content

LettuceAI/engine

Repository files navigation

Lettuce Engine

Hyperrealistic AI character engine — identity, not roleplay.

Lettuce Engine makes LLMs be characters instead of playing them. There is no "act as" instruction, no roleplay framing. The entire system prompt is written as the character's own internal awareness — first-person declarative identity. The LLM receives a context that is the character's mental state: their memories, emotions, relationships, knowledge, and moment-to-moment experience of time.

The engine runs background loops that autonomously research characters via the web, generate layered memories, track emotional state across conversations, evolve per-user relationships, and validate every response for consistency. It works with any character — fictional, historical, or original — across multiple LLM backends.


Architecture Overview

Architecture

The engine is organized into 10 subsystems that wire together through a central orchestrator (LettuceEngine). Entry points include a terminal CLI, a Discord bot, a REST + WebSocket API, and direct programmatic access.


Core Philosophy

Traditional AI roleplay tells the model "pretend to be X." Lettuce Engine never does this. Instead, the system prompt is a declarative identity document written entirely in first person:

I am Sherlock Holmes. I reside at 221B Baker Street, London.
I recall the curious case of the Speckled Band...
At this moment, I feel intellectually restless.
I am speaking with James. We have conversed three times previously.

No meta-layer. No "you are an AI assistant." The LLM simply receives a context that IS the character's inner world.

Identity Prompt Structure

The system prompt is assembled dynamically per-request from 9 sections, each populated by a different subsystem (memory retrieval, emotion engine, relationship tracker, time awareness, etc.).


How a Message Flows

Data Flow

Every user message passes through a 6-stage pipeline:

  1. Analyze — spaCy extracts entities and topics. DistilRoBERTa classifies the emotional content. Time-based emotion decay is applied.

  2. Retrieve Context — Three retrieval signals run in parallel:

    • Dense search (ChromaDB) — semantic similarity via sentence-transformer embeddings
    • BM25 search — keyword matching over the full memory corpus
    • Knowledge graph traversal — entities in the message are looked up in the character's NetworkX graph, and related memories are pulled via BFS

    Results are merged via Reciprocal Rank Fusion (RRF) with configurable weights, then boosted by recency. A 5% random surfacing chance injects unexpected memories for naturalism.

  3. Build Prompt — The PromptAssembler constructs the 9-section first-person system prompt from the character definition, retrieved memories, current emotional state, relationship context, and time awareness.

  4. Generate — The assembled prompt + conversation history is sent to the configured LLM backend (Anthropic, OpenAI, OpenRouter, or Ollama). Streaming is supported via WebSocket.

  5. Validate — A 4-signal consistency pipeline checks the response:

    • Voice TF-IDF — is the vocabulary within the character's speech profile?
    • Identity Anchor — any AI-speak tells? ("certainly", "great question", bullet points)
    • NER Anachronism — entities from the wrong era?
    • Knowledge Graph — contradictions with established facts?

    If validation fails, the response is regenerated with explicit constraints (up to 2 retries).

  6. Post-Response Updates — Conversation turns are stored in SQLite + ChromaDB. Entities are tracked. Relationship dimensions (trust, affection, respect) are updated based on exchange sentiment. The character's own emotional state updates from its response.


Memory System

Memory System

Memory is the most complex subsystem. Characters don't just retrieve static facts, they have a living, evolving memory that decays, consolidates, and grows over time.

Dual Storage

Every memory exists in two places:

  • ChromaDB — vector embeddings for semantic search (all-MiniLM-L6-v2)
  • SQLite — metadata, importance scores, access counts, timestamps, tags

Memory Types

Type Description Source
episodic Specific events and experiences Bootstrapper, conversations, research
semantic Facts, knowledge, skills Research, bootstrapper
emotional Emotional reactions and feelings Bootstrapper, conversation synthesis
conversation Retained impressions from chats Conversation synthesis loop

Hybrid Retrieval

The HybridRetriever combines three signals using Reciprocal Rank Fusion:

Signal Weight What It Catches
Dense (ChromaDB) 0.5 Semantic similarity, paraphrases
BM25 (sparse) 0.3 Exact keyword matches that embeddings miss
Graph (NetworkX) 0.2 Entity relationships invisible to both

Memory Lifecycle

  1. Creation — Memories are born from bootstrapping, conversation synthesis, or research
  2. Retrieval — Accessed memories get their access_count incremented and last_accessed updated
  3. Decay — Ebbinghaus forgetting curve: importance *= e^(-lambda * hours). Frequently accessed memories decay slower (spaced repetition effect)
  4. Consolidation — HDBSCAN clusters similar memories; low-importance duplicates are merged into the strongest memory
  5. Pruning — Memories that decay below the minimum threshold (0.05) are permanently deleted

Conversation-to-Memory Synthesis

Raw conversation turns are not directly useful as memories. Every 10 minutes, the synthesis loop:

  1. Fetches unsynthesized conversation turns since the last run
  2. Groups them by user
  3. Sends each conversation to the LLM with a synthesis prompt: "What do I actually remember from this conversation?"
  4. The LLM produces memories the way a real person would retain them — impressions, key facts, emotional reactions, not a verbatim log
  5. If nothing memorable happened, it outputs nothing (no memory pollution)

Emotion Engine

Emotion Engine

Emotions are 8-dimensional vectors based on Plutchik's wheel of emotions, not LLM-guessed strings.

Dimensions

Dimension Range Valence
Joy 0–1 Positive
Trust 0–1 Positive
Fear 0–1 Negative
Surprise 0–1 Neutral
Sadness 0–1 Negative
Disgust 0–1 Negative
Anger 0–1 Negative
Anticipation 0–1 Positive

How It Works

  1. DistilRoBERTa (j-hartmann/emotion-english-distilroberta-base) classifies the emotion in each user message locally — no LLM call
  2. Classifier output is mapped to Plutchik dimensions
  3. Personality weights from the character's traits modify the response (an angry character amplifies anger signals)
  4. For simple shifts: vector math (blend, amplify) computes the new state instantly
  5. For complex shifts (betrayal, major revelations): LLM fallback for nuanced emotional reasoning
  6. Over time: exponential decay pulls emotions back toward the character's baseline

This approach handles ~90% of emotional updates with pure math — no API calls, no latency, deterministic results.

Output

The emotional state produces human-readable descriptions for the prompt:

I feel a strong sense of anticipation mixed with mild unease.

It also exposes computed properties: primary_emotion, secondary_emotion, intensity (L2 norm), and valence (-1 to +1).


Background Loops

Background Loops

The engine runs several autonomous background tasks that keep the character alive between conversations:

Loop Default Interval What It Does
Initial Research Once on boot Scrapes all research_seeds (Wikipedia, Fandom, web), extracts facts into knowledge graph, generates episodic memories
Conversation Synthesis Every 10 min Converts recent conversation turns into first-person memories the character retains
BM25 Rebuild Every 15 min Rebuilds the sparse keyword search index as new memories arrive
Memory Consolidation Every 60 min Applies Ebbinghaus decay, HDBSCAN clustering, merges duplicates, prunes faded memories
Periodic Research Every 6 hours Re-scrapes research seeds, explores deeper, generates new memories

All intervals are configurable in config/settings.yaml under the background: section.

In the CLI, background tasks run alongside the chat loop using asyncio (input is handled in a thread executor so the event loop stays free). In the Discord bot and API server, they run naturally within the async event loop. Research loops can be enabled/disabled per character via the research_enabled YAML field or the PUT /characters/{slug}/research API endpoint.


Knowledge Graph

Each character has a NetworkX directed graph that tracks entities, relationships, and facts:

  • Nodes: people, places, organizations, events, dates, concepts
  • Edges: relationships (friend_of, lives_at, happened_in, knows_of, etc.)
  • Serialized as GraphML files per character

The graph is populated from:

  • Memory bootstrapping — spaCy NER extracts entities from the generated biography
  • Research synthesis — the LLM extracts structured SUBJECT | PREDICATE | OBJECT facts from scraped content
  • Conversation tracking — new entities mentioned in conversation are added automatically

The graph enables:

  • Contradiction detection — "Does this response contradict an established fact?" is a graph traversal, not an LLM call
  • Contextual retrieval — when a user mentions "Watson", the graph finds all related entities and pulls memories about them
  • Consistency validation — ensures the character never states something that conflicts with their established knowledge

Relationship System

Each user gets a dedicated Relationship model that tracks:

Dimension Range Description
familiarity 0–1 How well the character knows this person
trust 0–1 How much the character trusts them
affection 0–1 How much the character likes them
respect 0–1 How much the character respects them

Plus: interaction_count, character_notes (impressions), topics_discussed, timestamps.

Relationships evolve gradually. After each exchange, sentiment analysis from the conversation adjusts the dimensions. The relationship context is injected into the prompt as a first-person description:

Watson and I have spoken many times. I trust him deeply and consider him a friend.
We have discussed the Baskerville case and Mrs. Hudson's complaints.

Time Awareness

The character knows what time it is — mapped to their era:

Format Example Output When Used
modern "2:30 pm" Default for present-day characters
24h "0315 hours" Military, aviation, etc.
victorian "a quarter past 8 in the evening" 19th century characters
medieval "midday" / "evening, the sun lowering" Pre-modern characters
narrative "the dead of night" Dreamlike, fantastical characters

Time format is driven by the time_format field in the character YAML. If not set, it's inferred from the era field.

Characters also have time_behaviors — descriptions of what they'd be doing at different times of day. These are injected directly into the prompt:

time_behaviors:
  early_morning: "I'm on guard duty or just got off. Running on caffeine and spite."
  afternoon: "Hottest part of the day. Everyone's hiding from the sun."
  evening: "Best time. It cools down, everyone's hanging out, smoking and talking."

Consistency Validation

Every response passes through a 4-signal validation pipeline before being returned:

  1. Voice TF-IDF — Builds a vocabulary profile from the character's memories and example quotes. Scores each response for vocabulary match. Catches responses that sound generic rather than character-specific.

  2. Identity Anchor — Scans for AI-speak tells: "certainly", "I understand your perspective", "great question", bullet-point formatting, numbered lists, excessive formality. These are dead giveaways that the character has broken.

  3. NER Anachronism Check — spaCy extracts entities from the response. Any entity that shouldn't exist in the character's era (e.g., "iPhone" in a Victorian setting) is flagged.

  4. Knowledge Graph Contradiction — Checks if the response states anything that conflicts with established facts in the character's graph.

If the combined severity exceeds the threshold, the response is regenerated with explicit correction constraints injected into the prompt. Maximum 2 retries.


LLM Backends

The engine abstracts LLM access behind a Protocol-based interface. Four backends are available, all with native streaming support:

Backend Class How It Works
anthropic AnthropicBackend Anthropic Python SDK — Claude models
openai OpenAIBackend OpenAI Python SDK — GPT models
openrouter OpenRouterBackend OpenAI-compatible API — access to many providers
ollama OllamaBackend HTTP calls to local Ollama server

The default backend is openrouter. Each character can override the backend and model in their YAML:

backend: anthropic
model: claude-sonnet-4-5-20250929
temperature: 0.9

Character Definition

Characters are defined in YAML with two modes:

Minimal Seed Mode

Provide a few fields and let the bootstrapper generate everything else:

name: Charlie Parker
is_seed: true
seed_prompt: "jazz alto saxophonist, 1940s-50s New York City, bebop pioneer"
era: 1940s

The MemoryBootstrapper will generate a full biography, extract entities into the knowledge graph, and create 25+ layered memories (formative, defining moments, personal, sensory, knowledge).

Full Definition Mode

Precise control over every aspect:

name: Jake Torres
era: modern, 2012
time_format: 24h
role: US Army infantry, E-4 Specialist
setting: >
  Deployed to FOB Shank, Logar Province, Afghanistan...

core_identity: >
  I'm Jake Torres, Specialist, US Army. 2nd Platoon, Bravo Company...

backstory: >
  Did basic at Fort Benning, summer 2011...

personality_traits:
  - loyal to his squad above all else
  - dark humor as a coping mechanism

speech_patterns:
  formality: casual
  verbosity: medium
  text_style: texting
  dialect: "Southwest US, military slang heavy"
  catchphrases: ["it is what it is", "roger that"]
  vocabulary_avoidances: ["certainly", "indeed", "fascinating"]
  filler_words: ["like", "man", "dude"]

knowledge_domains:
  - US Army infantry tactics
  - Afghanistan geography

knowledge_boundaries:
  - I don't know anything about the big strategic picture
  - I have no idea what's happening back in the US

research_enabled: true              # toggle all research for this character
research_seeds:
  - "10th Mountain Division Afghanistan 2012"
  - "FOB Shank Logar Province"

time_behaviors:
  early_morning: "on guard duty, running on rip-its and spite"
  evening: "best time, everyone hanging out, smoking and talking"

baseline_emotions:
  joy: 0.2
  trust: 0.5
  anger: 0.4
  sadness: 0.4

See config/characters/original_template.yaml for the full field reference.


Project Structure

lettuce-engine/
├── pyproject.toml                         # Build config, all dependencies
├── config/
│   ├── settings.yaml                      # Global engine settings
│   └── characters/
│       ├── sam_thompson.yaml              # Example: Special Forces soldier
│       ├── sherlock_holmes.yaml            # Example: Victorian detective
│       ├── soldier_example.yaml           # Example: modern soldier (24h time)
│       └── original_template.yaml         # Field reference template
│
├── src/lettuce/
│   ├── engine.py                          # Core orchestrator
│   ├── cli.py                             # CLI entry point (chat, stats, discord, serve)
│   │
│   ├── api/
│   │   ├── app.py                         # FastAPI app factory, lifespan, setup gate
│   │   ├── config_manager.py              # YAML config read/write with dotpath access
│   │   ├── engine_manager.py              # Multi-character engine cache
│   │   ├── auth.py                        # Bearer token authentication
│   │   ├── schemas.py                     # Pydantic request/response models
│   │   ├── routes_setup.py                # Setup flow + config CRUD
│   │   ├── routes_characters.py           # Character listing/loading/research toggle
│   │   ├── routes_chat.py                 # REST chat + WebSocket streaming + history
│   │   └── routes_status.py               # Health, system dashboard, user data deletion
│   │
│   ├── identity/
│   │   ├── character.py                   # Character dataclass + YAML loader
│   │   ├── prompt_assembler.py            # 9-section prompt builder
│   │   ├── time_awareness.py              # Era-mapped time perception
│   │   ├── identity_anchor.py             # AI-speak drift detection
│   │   └── memory_bootstrapper.py         # Seed → full person generator
│   │
│   ├── llm/
│   │   ├── backend.py                     # LLMBackend protocol + data models
│   │   ├── router.py                      # Backend factory
│   │   ├── anthropic.py                   # Claude API (streaming)
│   │   ├── openai.py                      # GPT API (streaming)
│   │   ├── openrouter.py                  # OpenRouter API (streaming)
│   │   └── ollama.py                      # Local models (streaming)
│   │
│   ├── memory/
│   │   ├── models.py                      # Memory + ConversationTurn dataclasses
│   │   ├── embedder.py                    # sentence-transformers wrapper
│   │   ├── vector_store.py                # ChromaDB wrapper
│   │   ├── sqlite_store.py                # SQLite relational store
│   │   ├── retriever.py                   # Hybrid retrieval (dense + BM25 + graph)
│   │   ├── generator.py                   # LLM memory synthesis
│   │   └── consolidator.py                # Decay, clustering, pruning
│   │
│   ├── nlp/
│   │   ├── pipeline.py                    # spaCy NER + analysis
│   │   ├── voice_analyzer.py              # TF-IDF voice profiling
│   │   ├── emotion_classifier.py          # DistilRoBERTa classification
│   │   └── entity_tracker.py              # Cross-conversation entity tracking
│   │
│   ├── knowledge/
│   │   ├── graph.py                       # NetworkX knowledge graph
│   │   ├── fact_store.py                  # Structured facts + contradiction check
│   │   └── contradiction_detector.py      # Graph + embedding contradiction detection
│   │
│   ├── emotion/
│   │   ├── state.py                       # 8D Plutchik vector model
│   │   ├── engine.py                      # ML classifier + vector math
│   │   └── decay.py                       # Exponential decay toward baseline
│   │
│   ├── relationships/
│   │   ├── models.py                      # Per-user relationship model
│   │   └── tracker.py                     # Gradual relationship evolution
│   │
│   ├── research/
│   │   ├── loop.py                        # Background research orchestrator
│   │   ├── synthesizer.py                 # LLM fact extraction
│   │   └── scrapers/                      # Wikipedia, Fandom, general web
│   │
│   ├── consistency/
│   │   ├── validator.py                   # 4-signal validation pipeline
│   │   └── contradiction_resolver.py      # Constraint builder for retries
│   │
│   ├── discord_bot/
│   │   ├── bot.py                         # Bot lifecycle
│   │   └── cogs/                          # Conversation, admin, character mgmt
│   │
│   └── db/
│       ├── connection.py                  # Async SQLite connection
│       └── migrations.py                  # Schema setup
│
└── tests/                                 # 33 tests across 6 files

Setup

Prerequisites

  • Python 3.11+
  • One of: Anthropic API key, OpenAI API key, OpenRouter API key, or Ollama running locally

Install

git clone <repo-url> lettuce-engine
cd lettuce-engine
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Download NLP Models

python -m spacy download en_core_web_sm

The sentence-transformer model (all-MiniLM-L6-v2) and emotion classifier (j-hartmann/emotion-english-distilroberta-base) are downloaded automatically on first use.

Environment

cp .env.example .env
# Edit .env with your API keys

Required keys depend on which backend you use:

  • anthropicANTHROPIC_API_KEY
  • openaiOPENAI_API_KEY
  • openrouterOPENROUTER_API_KEY (or set api_key in settings under llm.openrouter)
  • ollama — No API key needed (local server)
  • Discord bot — DISCORD_TOKEN
  • API auth — LETTUCE_API_KEY (optional, empty = auth disabled)

Usage

Terminal Chat

# Chat with a bundled character
lettuce chat --character sherlock_holmes

# Chat with a custom character file
lettuce chat --character-file my_character.yaml

# Specify a settings file
lettuce chat --character soldier_example --settings config/settings.yaml

Inside the chat:

  • Type messages normally to talk to the character
  • /stats — show engine statistics (memory count, emotion state, graph size, etc.)
  • quit or exit — leave

Background loops (memory synthesis, consolidation, research) run automatically while you chat.

Discord Bot

lettuce discord --character-file config/characters/sherlock_holmes.yaml

The bot responds to @mentions and DMs. It simulates typing delay proportional to response length. Admin commands and character info are available via cogs.

REST + WebSocket API

lettuce serve                     # 0.0.0.0:8000
lettuce serve --port 3000         # custom port
lettuce serve --reload            # dev mode with auto-reload

The API supports multi-character management, streaming chat via WebSocket, runtime configuration, conversation history retrieval, and a gated setup flow for first-run configuration. See API.md for full endpoint documentation.

Engine Statistics

lettuce stats --character sherlock_holmes

Programmatic API

from lettuce.engine import LettuceEngine, load_settings
from lettuce.identity.character import Character

character = Character.from_yaml("config/characters/sherlock_holmes.yaml")
settings = load_settings()

engine = LettuceEngine(character, settings)
await engine.setup()
await engine.start_background_loops()

response = await engine.respond(
    "What do you make of this footprint, Holmes?",
    user_id="watson",
    user_name="Watson",
)
print(response)

await engine.stop_background_loops()

Configuration

All settings live in config/settings.yaml:

engine:
  default_backend: openrouter       # anthropic, openai, openrouter, ollama
  data_dir: ./data

llm:
  openrouter:
    model: anthropic/claude-sonnet-4-5-20250929
    api_key: sk-or-...              # or set OPENROUTER_API_KEY env var
    max_tokens: 4096
  anthropic:
    model: claude-sonnet-4-5-20250929
    temperature: 0.9
  openai:
    model: gpt-4o
  ollama:
    model: llama3.1
    base_url: http://localhost:11434

memory:
  embedding_model: all-MiniLM-L6-v2
  max_retrieval_results: 15
  dense_weight: 0.5                 # ChromaDB semantic search weight
  bm25_weight: 0.3                  # Keyword search weight
  graph_weight: 0.2                 # Knowledge graph weight

emotion:
  decay_rate: 0.1
  decay_interval_minutes: 30

background:
  synthesis_interval_minutes: 10    # Conversation → memory synthesis
  consolidation_interval_minutes: 60
  bm25_rebuild_interval_minutes: 15
  drip_research_interval_minutes: 60

api:
  # api_key: ""                     # Set via LETTUCE_API_KEY env var or here

Technology Stack

ML / AI (all local, no API calls)

Component Library Purpose
Embeddings sentence-transformers Memory vectorization (all-MiniLM-L6-v2)
NER spaCy (en_core_web_sm) Entity extraction from conversations and research
Emotion transformers DistilRoBERTa emotion classification
Voice scikit-learn TF-IDF vocabulary profiling
Clustering scikit-learn HDBSCAN memory deduplication
Sparse search rank-bm25 BM25 keyword retrieval
Emotion math numpy Plutchik vector operations, decay curves

Storage

Component Library Purpose
Vector DB chromadb Semantic memory search
Relational aiosqlite Conversation turns, relationships, metadata
Knowledge networkx Entity graph (serialized as GraphML)

LLM Backends

Backend Library Purpose
Anthropic anthropic Claude models (streaming)
OpenAI openai GPT models (streaming)
OpenRouter openai Multi-provider access (streaming)
Ollama aiohttp Local model server (streaming)

Infrastructure

Component Library Purpose
REST API fastapi + uvicorn HTTP + WebSocket server
Discord discord.py Bot framework
CLI click Command-line interface
Config pyyaml + pydantic Settings and character parsing
Logging structlog Structured logging
Web scraping trafilatura + beautifulsoup4 Research content extraction

Testing

# Run all tests
pytest

# Run with verbose output
pytest -v

# Run a specific test file
pytest tests/test_emotion.py

33 tests cover: character loading, emotion math, knowledge graph operations, prompt assembly, relationship tracking, and time awareness formatting.


Design Principles

ML-first, LLM-second. Use local ML models (classifiers, embeddings, NER, TF-IDF) for fast, cheap, deterministic operations. Reserve LLM calls for creative tasks — memory generation, response generation, complex emotional shifts. This cuts LLM API costs by ~70%.

Hybrid retrieval over pure vector search. Dense embeddings catch semantic similarity but miss exact keywords. BM25 catches keywords but misses paraphrases. The knowledge graph catches entity relationships that neither can find. RRF combines all three.

Vector emotions over string emotions. 8D Plutchik vectors allow instant mathematical operations (decay, blend, amplify). No API call needed to answer "how does the character feel right now?"

Graph-based consistency. "Does Holmes know about computers?" is a graph lookup — instant. Not a 2-second LLM call.

Forgetting is a feature. The Ebbinghaus decay curve means characters naturally forget unimportant things while retaining what matters. Frequently accessed memories strengthen. This mimics how human memory actually works.

About

Hyperrealistic AI character engine — identity, not roleplay

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages