feat: add first-party voice transcription with local Whisper #11345

cmdNiels · 2026-01-30T17:48:26Z

What does this PR do?

Implements first-party, local voice transcription using Whisper.cpp via @huggingface/transformers. All processing happens locally with zero configuration required.

Features

Privacy-first: All transcription happens locally, no external API calls
Zero configuration: Works out of the box after enabling in /status
Offline support: Works completely offline after initial model download
Three model sizes: tiny (75MB), base (142MB, default), small (466MB)
Customizable keybind: Default \ for recording, configurable in ~/.config/opencode/opencode.json

This follows the frontend-backend separation design of OpenCode:

TUI: Uses sox/ffmpeg for microphone recording (tested on Linux)

Usage

Enable voice in /status dialog
Wait for model download (one-time, ~142MB for base model)
Press \ to start recording
Speak clearly
Press \ again to stop
Transcribed text appears in prompt input

Models are cached in ~/.cache/opencode/models/ and persist across sessions.

Comparison with PR #9264

PR #9264 implements voice via external APIs (Groq/OpenAI/local whisper-fastapi). This implementation prioritizes privacy and simplicity. Both approaches have merit. This implementation aligns with OpenCode's philosophy of local-first, privacy-respecting tooling.

Fixes #2425
Fixes #4695

How did you verify your code works?

All 792 tests pass, TypeScript compiles cleanly
Manually tested: recording with \ keybind, model download/caching, transcription accuracy
Verified config persistence, auto-start, offline mode with cached models
Tested on Linux with sox/ffmpeg (macOS/Windows unverified but should work)

github-actions · 2026-01-30T17:49:50Z

The following comment was made by an LLM, it may be inaccurate:

Potential Related PRs Found

1. PR #9264 - feat: voice-typing using external Whisper / ALM API

feat: voice-typing using external Whisper / ALM API #9264
Relationship: This is explicitly mentioned in the PR description as a prior approach. While PR feat: add first-party voice transcription with local Whisper #11345 uses local Whisper.cpp with privacy-first design, PR feat: voice-typing using external Whisper / ALM API #9264 implements voice via external APIs (Groq/OpenAI/local whisper-fastapi). Both address the same feature request but with different implementation philosophies.

2. PR #3827 - Add voice-to-text transcription feature

Add voice-to-text transcription feature #3827
Relationship: Earlier attempt at voice-to-text functionality that may be related to the same feature request. Worth checking if it's been closed/superseded by either of the newer PRs.

Note: PR #11345 (the current PR) explicitly references PR #9264 in its description and provides a comparison, so the relationship is acknowledged by the authors.

- Only attempt FFI stderr redirection on Linux (process.platform check) - Move dlopen call inside redirectStderr function with try/catch - Gracefully skip stderr suppression on Windows/macOS - Allows voice feature to work cross-platform without E2E test failures

Replaces @xenova/transformers with @huggingface/transformers 3.8.1 which properly respects ONNX logging configuration. Removes 63 lines of complex FFI stderr redirection code in favor of simple env.backends.onnx.logSeverityLevel configuration.

shenron0101 · 2026-02-11T10:45:08Z

Eagerly waiting for this PR to be merged!

- Resolved conflicts in prompt/index.tsx by keeping both VoiceRecorder and DialogSkill features - Regenerated bun.lock after merge

cmdNiels added 11 commits January 29, 2026 21:16

Added POC

16549fe

Switched to javascript to fit repo better.

aafd563

Fixed voice settings.

d9fd0f4

Merge origin/dev into voice branch

9cdffc9

Updated keybinding references.

27805fd

Removed double information.

989980e

Changed code to fit repo guidelines better.

1ac3804

Updated voice model cache directory.

cc330c8

Merge branch 'voice' into dev

508b5bd

fix: move voice dependencies to correct package location

95746fa

chore: remove unnecessary blank line from .gitignore

54d74ca

cmdNiels force-pushed the dev branch from 8dedcbb to 7356edf Compare January 30, 2026 18:43

cmdNiels added 8 commits January 30, 2026 20:19

fix: remove language config from whisper english only model.

e02a2ac

fix: remove unneccecary bus changes.

e926dc2

chore: removed Xenova prefix from models

f4eb968

chore: removed Xenova prefix from model loading

1ce1661

chore: removed top level log override.

4756ace

Merge branch 'dev' into dev

ecffc87

chore: updated bun.lock to cleanly resolve conflicts

52ef762

cmdNiels mentioned this pull request Jan 31, 2026

[FEATURE]: Speech-to-Text Voice Input for Lazy People in OpenCode #4695

Open

1 task

Merge upstream/dev into dev

6116233

- Resolved conflicts in prompt/index.tsx by keeping both VoiceRecorder and DialogSkill features - Regenerated bun.lock after merge

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add first-party voice transcription with local Whisper #11345

feat: add first-party voice transcription with local Whisper #11345

cmdNiels commented Jan 30, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 30, 2026

Uh oh!

shenron0101 commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: add first-party voice transcription with local Whisper #11345

Are you sure you want to change the base?

feat: add first-party voice transcription with local Whisper #11345

Conversation

cmdNiels commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Features

Usage

Comparison with PR #9264

How did you verify your code works?

Uh oh!

github-actions bot commented Jan 30, 2026

Potential Related PRs Found

Uh oh!

shenron0101 commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cmdNiels commented Jan 30, 2026 •

edited

Loading