Skip to content

Conversation

@cmdNiels
Copy link

@cmdNiels cmdNiels commented Jan 30, 2026

What does this PR do?

Implements first-party, local voice transcription using Whisper.cpp via @huggingface/transformers. All processing happens locally with zero configuration required.

Features

  • Privacy-first: All transcription happens locally, no external API calls
  • Zero configuration: Works out of the box after enabling in /status
  • Offline support: Works completely offline after initial model download
  • Three model sizes: tiny (75MB), base (142MB, default), small (466MB)
  • Customizable keybind: Default \ for recording, configurable in ~/.config/opencode/opencode.json

This follows the frontend-backend separation design of OpenCode:

  • TUI: Uses sox/ffmpeg for microphone recording (tested on Linux)

Usage

  1. Enable voice in /status dialog
  2. Wait for model download (one-time, ~142MB for base model)
  3. Press \ to start recording
  4. Speak clearly
  5. Press \ again to stop
  6. Transcribed text appears in prompt input

Models are cached in ~/.cache/opencode/models/ and persist across sessions.

Comparison with PR #9264

PR #9264 implements voice via external APIs (Groq/OpenAI/local whisper-fastapi). This implementation prioritizes privacy and simplicity. Both approaches have merit. This implementation aligns with OpenCode's philosophy of local-first, privacy-respecting tooling.

Fixes #2425
Fixes #4695

How did you verify your code works?

  • All 792 tests pass, TypeScript compiles cleanly
  • Manually tested: recording with \ keybind, model download/caching, transcription accuracy
  • Verified config persistence, auto-start, offline mode with cached models
  • Tested on Linux with sox/ffmpeg (macOS/Windows unverified but should work)

@github-actions
Copy link
Contributor

The following comment was made by an LLM, it may be inaccurate:

Potential Related PRs Found

1. PR #9264 - feat: voice-typing using external Whisper / ALM API

2. PR #3827 - Add voice-to-text transcription feature

  • Add voice-to-text transcription feature #3827
  • Relationship: Earlier attempt at voice-to-text functionality that may be related to the same feature request. Worth checking if it's been closed/superseded by either of the newer PRs.

Note: PR #11345 (the current PR) explicitly references PR #9264 in its description and provides a comparison, so the relationship is acknowledged by the authors.

- Only attempt FFI stderr redirection on Linux (process.platform check)
- Move dlopen call inside redirectStderr function with try/catch
- Gracefully skip stderr suppression on Windows/macOS
- Allows voice feature to work cross-platform without E2E test failures
@shenron0101
Copy link

Eagerly waiting for this PR to be merged!

- Resolved conflicts in prompt/index.tsx by keeping both VoiceRecorder and DialogSkill features
- Regenerated bun.lock after merge
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: Speech-to-Text Voice Input for Lazy People in OpenCode feat: first party support for voice conversing

2 participants