feat: Transcription - Refactor making transcription much more robust / capable. by neurocis · Pull Request #382 · spacedriveapp/spacebot

neurocis · 2026-03-10T03:12:26Z

Voice Transcription
Spacebot converts audio attachments (Telegram voice messages, Discord audio clips, etc.) to text using Whisper-compatible speech-to-text APIs before passing them to the channel LLM.

How It Works
User sends voice message
│
▼
Channel receives audio attachment (audio/* MIME type)
│
▼
transcribe_audio_attachment()
│
├─ Downloads audio bytes from attachment URL
├─ Reads routing config: voice, voice_language, voice_translate, stt_provider
├─ Resolves STT provider (stt_provider override → voice model prefix → error)
├─ Checks provider supports Whisper API
├─ Sends multipart POST to /v1/audio/transcriptions (or /translations)
└─ Returns transcript as <voice_transcript> XML tag in conversation
The transcript is injected into conversation history as structured content:

<voice_transcript name="voice_message.ogg" mime="audio/ogg">
Hello, this is what the user said.
</voice_transcript>
When translation mode is enabled, the tag changes:

<voice_translation name="voice_message.ogg" mime="audio/ogg">
Hello, this is the English translation of what the user said.
</voice_translation>
Configuration
All voice settings live under [defaults.routing] (or per-agent [[agents]].routing) in config.toml.

Parameters
Parameter Type Default Description
voice String Provider-dependent (see below) STT model in "provider/model" format. Empty string disables voice transcription.
voice_language Option None ISO 639-1 language hint for transcription accuracy (e.g., "en", "es", "fr", "ja"). Ignored in translation mode.
voice_translate bool false When true, uses the /v1/audio/translations endpoint to translate audio to English instead of transcribing in the source language.
stt_provider Option None Override which provider handles STT. When absent, the provider is extracted from the voice model string prefix.
Provider Defaults
When no explicit voice is configured, Spacebot sets a default based on the primary provider:

Provider Default voice Notes
openai openai/whisper-1 Native OpenAI Whisper API
groq groq/whisper-large-v3-turbo Fast, cheap Whisper endpoint
gemini gemini/gemini-2.5-flash Via Gemini's OpenAI-compatible endpoint
openrouter (empty) No native STT — must configure stt_provider separately
anthropic (empty) No STT support — must configure stt_provider separately
All others (empty) Must configure voice and optionally stt_provider
Available STT Models
Provider Models Endpoint
OpenAI whisper-1, gpt-4o-transcribe, gpt-4o-mini-transcribe POST /v1/audio/transcriptions
Groq whisper-large-v3, whisper-large-v3-turbo POST /openai/v1/audio/transcriptions
Gemini gemini-2.5-flash (and other Gemini models) POST /v1/audio/transcriptions (OpenAI-compatible)
Supported Audio Formats
The Whisper API accepts: flac, m4a, mp3, mp4, mpeg, mpga, oga, ogg, wav, webm.

Telegram voice messages are OGG/Opus, which is natively supported.

Environment Variables
Environment variables override config file values:

Variable Maps to Example
SPACEBOT_VOICE_MODEL routing.voice groq/whisper-large-v3-turbo
SPACEBOT_VOICE_LANGUAGE routing.voice_language en
SPACEBOT_VOICE_TRANSLATE routing.voice_translate true
SPACEBOT_STT_PROVIDER routing.stt_provider groq
Resolution order: environment variable → config file → provider default.

API
GET /api/config
Returns current routing configuration including voice fields:

{
"routing": {
"voice": "groq/whisper-large-v3-turbo",
"voice_language": "en",
"voice_translate": false,
"stt_provider": "groq"
}
}
PATCH /api/config
Update voice settings at runtime:

{
"agent_id": "main",
"routing": {
"voice": "openai/whisper-1",
"voice_language": "es",
"voice_translate": false,
"stt_provider": "openai"
}
}
GET /api/models?capability=voice_transcription
Returns models from providers that support Whisper-compatible transcription (currently: openai, groq, gemini).

Example Configurations
Groq for chat and transcription
[llm]
groq_key = "gsk_xxx"

[defaults.routing]
channel = "groq/llama-3.3-70b-versatile"
voice = "groq/whisper-large-v3-turbo"
OpenRouter for chat, Groq for transcription
[llm]
openrouter_key = "sk-or-xxx"
groq_key = "gsk_xxx"

[defaults.routing]
channel = "openrouter/anthropic/claude-sonnet-4"
voice = "groq/whisper-large-v3-turbo"
voice_language = "en"
Anthropic for chat, OpenAI for transcription with translation
[llm]
anthropic_key = "sk-ant-xxx"
openai_key = "sk-xxx"

[defaults.routing]
channel = "anthropic/claude-sonnet-4"
voice = "openai/whisper-1"
voice_translate = true
stt_provider = "openai"
Multilingual transcription with language hint
[llm]
openai_key = "sk-xxx"

[defaults.routing]
channel = "openai/gpt-4.1"
voice = "openai/whisper-1"
voice_language = "ja"
Error Handling
Errors are returned as inline text messages in conversation context (not exceptions), so the channel LLM sees the failure and can inform the user:

Condition Message
No voice model configured [Audio attachment received but no voice model is configured. Add voice = "provider/model" to [defaults.routing] in config.]
STT provider not configured [Audio transcription failed: provider 'xxx' is not configured]
Provider doesn't support Whisper [Audio transcription not supported by provider 'xxx'. Configure a Whisper-compatible STT provider (openai, groq, gemini).]
Transcription API error [Audio transcription failed for filename.ogg: Whisper API error (400): ...message...]
Download failure [Failed to download audio: filename.ogg]
There is no fallback to multimodal chat. If transcription fails, the error is returned directly.

Architecture
Module Layout
src/llm/transcription.rs — Whisper API client (multipart form, endpoint routing, response parsing)
src/llm/routing.rs — RoutingConfig with voice, voice_language, voice_translate, stt_provider
src/agent/channel_attachments.rs — transcribe_audio_attachment() orchestration
src/config/toml_schema.rs — TOML deserialization for voice config fields
src/config/providers.rs — resolve_routing() merges TOML → base config
src/config/load.rs — Environment variable resolution
src/api/config.rs — API GET/PATCH for voice settings
src/api/models.rs — voice_transcription capability filter
Request Flow
channel_attachments::download_attachments() detects audio/* MIME type
Calls transcribe_audio_attachment() which:
Downloads raw bytes from the attachment URL
Reads routing.voice, routing.voice_language, routing.voice_translate, routing.stt_provider from RuntimeConfig
Resolves provider via stt_provider override or voice model prefix
Validates provider supports Whisper via supports_whisper_transcription()
Calls llm::transcribe_audio() which:
Builds the correct endpoint URL based on provider (build_whisper_endpoint())
Constructs a multipart form with: file (audio bytes), model, response_format: json, optional language
Sends POST with Authorization: Bearer header and any extra_headers
Parses {"text": "...", "duration": ...} response
Transcript injected as or XML tag into the channel conversation
Provider Endpoint Mapping
Provider Base URL Transcription Path Translation Path
OpenAI https://api.openai.com /v1/audio/transcriptions /v1/audio/translations
Groq https://api.groq.com/openai /openai/v1/audio/transcriptions /openai/v1/audio/translations
Gemini https://generativelanguage.googleapis.com/v1beta/openai /v1/audio/transcriptions /v1/audio/translations
Key Design Decisions
Multipart form data — The Whisper API requires multipart uploads with the audio file, not JSON with base64-encoded audio.
No fallback — On failure, an error message is returned. The previous approach of falling back to multimodal chat with input_audio content type is removed.
Separate STT routing — stt_provider allows using a different provider for transcription than for chat (e.g., Anthropic for chat, Groq for STT).
Language hint ignored for translations — Per the Whisper API spec, language is only applicable to transcriptions.

Note

Implementation Summary

Refactored voice transcription from Anthropic's multimodal input_audio API to standard Whisper-compatible /v1/audio/transcriptions endpoints, enabling support for OpenAI, Groq, and Gemini providers.

Key changes: New src/llm/transcription.rs module encapsulates Whisper API client with multipart form handling, provider-specific endpoint routing, and comprehensive test coverage. Updated channel_attachments.rs to use the new transcription module, removing 120+ lines of inline logic. Extended RoutingConfig with voice_language, voice_translate, and stt_provider fields to support language hints and provider override. Updated config loading, API endpoints, and models filtering to expose new voice settings. Simplified model filtering in src/api/models.rs by checking provider capability rather than maintaining a hardcoded model list.

_{Written by Tembo for commit 96759b2. This will update automatically on new commits.}

Replace the incorrect multimodal chat approach with proper Whisper-compatible speech-to-text APIs using multipart form data. Changes: - Add voice_language, voice_translate, stt_provider config fields - Create new transcription module with Whisper-compatible implementation - Support OpenAI, Groq, and Gemini OpenAI-compatible endpoints - Add environment variables: SPACEBOT_VOICE_LANGUAGE, SPACEBOT_VOICE_TRANSLATE, SPACEBOT_STT_PROVIDER - Set sensible voice defaults for OpenAI (whisper-1), Groq (whisper-large-v3-turbo), Gemini (gemini-2.5-flash) - Update API config response with new STT fields - Add comprehensive unit tests for transcription module The previous implementation incorrectly used /v1/chat/completions with input_audio content type. Now uses proper /v1/audio/transcriptions endpoint with multipart form data for actual speech-to-text transcription.

The send() method returns reqwest::Error which doesn't have a From implementation for our Error type. Map it to LlmError::ProviderRequest.

…tion

coderabbitai · 2026-03-10T03:12:43Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8d5ec799-cdfc-47b9-8218-c16b42311b96

📥 Commits

Reviewing files that changed from the base of the PR and between 01e50d8 and 46616fd.

📒 Files selected for processing (2)

src/config/load.rs
src/config/toml_schema.rs

🚧 Files skipped from review as they are similar to previous changes (1)

src/config/load.rs

Walkthrough

Adds voice transcription: new routing/config fields and env overrides for STT, a Whisper-compatible transcription module and tests, provider-capability checks, documentation for voice transcription, API and routing updates, and agent attachment changes to call the new transcription pathway.

Changes

Cohort / File(s)	Summary
Repository ignores `/.gitignore`	Expanded ignore patterns (Rust `target`/`.Cargo.lock`, Python `__pycache__`/envs, Node `node_modules`/builds) and generalized OpenCode ignore to `.opencode*/`.
Documentation `docs/content/docs/(configuration)/config.mdx`, `docs/content/docs/(core)/routing.mdx`, `docs/content/docs/(features)/voice-transcription.mdx`	Added voice/STT docs: env vars, routing defaults, full voice-transcription feature page, examples, and reference entries.
Config loading & schema `src/config/load.rs`, `src/config/toml_schema.rs`, `src/config/providers.rs`	Added env overrides `SPACEBOT_VOICE_LANGUAGE`, `SPACEBOT_VOICE_TRANSLATE`, `SPACEBOT_STT_PROVIDER`; TOML schema and merge logic now include `voice_language`, `voice_translate`, `stt_provider`.
Routing & LLM defaults `src/llm/routing.rs`, `src/llm.rs`	Extended `RoutingConfig` with `voice_language`, `voice_translate`, `stt_provider`; initialized defaults and re-exported new transcription module.
Transcription implementation `src/llm/transcription.rs`	New Whisper-compatible transcription client, public `TranscriptionRequest/Response`, `transcribe_audio()`, endpoint/path builders, multipart form handling, response parsing, provider capability helper, and tests.
API surface `src/api/config.rs`, `src/api/models.rs`	API structs now expose new routing fields and persistence; replaced model-list checks with provider-based `WHISPER_CAPABLE_PROVIDERS` and `supports_voice_transcription()` logic.
Agent integration `src/agent/channel_attachments.rs`	Removed inline HTTP transcription handling; now derive provider/model and call `transcribe_audio()`; validate provider via `supports_whisper_transcription()`; updated error messages and response wrapping.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Split channel.rs and standardize adapter metadata keys #271: Edits to src/agent/channel_attachments.rs and earlier transcription flow overlap with the agent integration changes here.
feat(llm): add Kilo Gateway and OpenCode Go provider support #225: Provider and model capability mapping changes relate to the new provider-based whisper capability logic.
refactor: extract config.rs into focused submodules #306: Overlapping changes to configuration loading and routing fields in src/config/*.

Suggested reviewers

jamiepine

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: Transcription - Refactor making transcription much more robust / capable.' clearly describes the main change: a refactor of transcription functionality to make it more robust and capable.
Description check	✅ Passed	The description is comprehensive and directly related to the changeset, detailing the voice transcription feature, how it works, configuration options, supported providers, API endpoints, and error handling.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Co-authored-by: KiloCodium <KiloCoder@neurocis.ai>

tembo · 2026-03-10T04:21:03Z

src/config/load.rs

+        if let Ok(voice_language) = std::env::var("SPACEBOT_VOICE_LANGUAGE") {
+            routing.voice_language = Some(voice_language);
+        }
+        if let Ok(voice_translate) = std::env::var("SPACEBOT_VOICE_TRANSLATE") {


SPACEBOT_VOICE_TRANSLATE currently only ever flips this on, so SPACEBOT_VOICE_TRANSLATE=false won’t override a true from config. Probably want to set the bool whenever the env var is present.

Suggested change

if let Ok(voice_translate) = std::env::var("SPACEBOT_VOICE_TRANSLATE") {

if let Ok(voice_translate) = std::env::var("SPACEBOT_VOICE_TRANSLATE") {

routing.voice_translate = voice_translate.eq_ignore_ascii_case("true");

}

tembo · 2026-03-10T04:21:12Z

src/agent/channel_attachments.rs

-            ));
-        }
-    };
+    let provider_id = routing.stt_provider.as_deref().unwrap_or_else(|| {


Per the doc comment in the PR description, this routing should be: stt_provider override → voice prefix → error. Defaulting to anthropic when voice has no provider/model prefix seems like it’ll produce confusing failures.

Suggested change

let provider_id = routing.stt_provider.as_deref().unwrap_or_else(|| {

let model_name = voice_model

.split_once('/')

.map(|(_, m)| m)

.unwrap_or(voice_model);

let provider_id = if let Some(stt_provider) = routing.stt_provider.as_deref() {

stt_provider

} else if let Some((p, _)) = voice_model.split_once('/') {

p

} else {

tracing::warn!(model = %voice_model, "invalid voice model route");

return UserContent::text(format!(

"[Audio transcription failed for {}: invalid voice model '{}'; expected provider/model]",

attachment.filename, voice_model

));

};

tembo · 2026-03-10T04:21:19Z

src/agent/channel_attachments.rs

+            } else {
+                "voice_transcript"
+            };
+            UserContent::text(format!(


response.text is ultimately user-controlled; as-is it can contain < / & / " and break the <voice_transcript ...> wrapper (or inject additional tags into the prompt). Escaping keeps the wrapper well-formed.

Suggested change

UserContent::text(format!(

let escape_attr = |s: &str| {

s.replace('&', "&")

.replace('<', "<")

.replace('>', ">")

.replace('"', """)

};

let escape_text = |s: &str| {

s.replace('&', "&")

.replace('<', "<")

.replace('>', ">")

};

let filename = escape_attr(&attachment.filename);

let mime_type = escape_attr(&attachment.mime_type);

let text = escape_text(&response.text);

UserContent::text(format!(

"<{} name=\"{}\" mime=\"{}\">\n{}\n</{}>",

tag, filename, mime_type, text, tag

))

tembo · 2026-03-10T04:21:25Z

src/llm/transcription.rs

+    translated: bool,
+) -> Result<TranscriptionResponse> {
+    let status = response.status();
+    let body: serde_json::Value = response


Minor robustness nit: on non-2xx responses some providers return non-JSON bodies (HTML/proxy text), and response.json() will fail and hide the underlying status/body. Consider reading response.text() first, then serde_json::from_str opportunistically, and include a truncated raw body in the error message when JSON parsing fails.

coderabbitai

Actionable comments posted: 9

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.gitignore:
- Around line 13-15: The .gitignore currently lists ".Cargo.lock" (a
dot-prefixed filename) which won't match the actual Rust lockfile "Cargo.lock";
update the ignore entry to "Cargo.lock" (or add both "Cargo.lock" and
".Cargo.lock" if you want to cover both) so the standard Rust lockfile is
properly ignored.

In `@src/agent/channel_attachments.rs`:
- Around line 272-275: Escape the values injected into the XML wrapper to
prevent tag/attribute injection: when constructing the UserContent::text in the
block that formats "<{} name=\"{}\" mime=\"{}\">...\n</{}>" (the code
referencing tag, attachment.filename, attachment.mime_type, response.text), run
attachment.filename and attachment.mime_type through an attribute-escaping
helper and run response.text through an element/inner-text escaping helper (or
add a small local xml_escape function if none exists) before formatting; ensure
the same escaped values are used for both the start and end tag context where
applicable to avoid breaking the wrapper.
- Around line 225-235: The current logic silently defaults provider_id to
"anthropic" when routing.stt_provider is None, which masks missing
configuration; change the handling in the block that computes provider_id and
model_name (using routing.stt_provider, voice_model, provider_id, and
model_name) to: if voice_model contains a '/' use the split provider and model
as now; else if routing.stt_provider is Some use that provider and voice_model
as model; otherwise return/raise an error (or propagate a Config/InvalidInput
error) indicating that stt_provider is unset and voice must be set as
"provider/model". Ensure the error message clearly references the missing
stt_provider and the expected "provider/model" format so operators can fix the
config.
- Around line 248-265: The provider list wrongly treats Gemini as
Whisper-compatible; update supports_whisper_transcription to exclude Gemini (do
not rely on googleapis.com or other URL checks that include Gemini) and either
remove Gemini from the branch that builds a TranscriptionRequest or add a
separate Gemini-specific audio path; specifically, change
supports_whisper_transcription(...) so it returns false for provider identifiers
or configs that indicate Gemini, and ensure transcribe_audio(...) is only called
for true results (leave TranscriptionRequest creation and the match on
transcribe_audio(...) unchanged for non-Gemini providers).

In `@src/api/config.rs`:
- Around line 172-174: The patch struct fields voice_language, voice_translate,
and stt_provider must be made tri-state so we can distinguish omitted vs
explicit null: change their types from Option<String>/Option<bool> to
Option<Option<String>> and Option<Option<bool>> in the PATCH struct (the struct
declared around the shown fields, e.g., the config patch type), then update
update_routing_table() to handle three cases: None => do nothing (field
omitted), Some(Some(value)) => write the TOML key with value, and Some(None) =>
delete the corresponding TOML key (remove it so routing falls back to
inherited/default). Apply the same tri-state change and handling for the other
occurrence referenced (around lines 595-603).

In `@src/api/models.rs`:
- Around line 108-110: The current check (supports_voice_transcription using
WHISPER_CAPABLE_PROVIDERS) treats provider support as model support and exposes
whole provider catalogs; replace it with a model-level allowlist or capability
flag. Add a new predicate (e.g., supports_voice_transcription_model(model_id:
&str) -> bool) that checks either a WHISPER_COMPATIBLE_MODELS set or a per-model
metadata flag, update all usages of supports_voice_transcription and
WHISPER_CAPABLE_PROVIDERS to call supports_voice_transcription_model (including
the other occurrences mentioned), and ensure the models endpoint filters by
model id/capability rather than by provider so only actual Whisper-compatible
models are returned for voice_transcription.

In `@src/config/load.rs`:
- Around line 790-800: The TOML load path (from_toml()/load_from_path()) doesn't
apply the SPACEBOT_VOICE_LANGUAGE, SPACEBOT_VOICE_TRANSLATE, and
SPACEBOT_STT_PROVIDER environment overrides so TOML values can shadow env vars
and SPACEBOT_VOICE_TRANSLATE cannot force false; after routing is built in
from_toml() (i.e., after routing resolution) apply the same env-override logic
used in load_from_env(): if SPACEBOT_VOICE_LANGUAGE is set, assign
routing.voice_language = Some(value); if SPACEBOT_STT_PROVIDER is set, assign
routing.stt_provider = Some(value); and for SPACEBOT_VOICE_TRANSLATE explicitly
parse the env var for "true"/"false" (case-insensitive) and set
routing.voice_translate accordingly so env can force both true and false. Ensure
you use the same symbol names (routing.voice_language, routing.voice_translate,
routing.stt_provider) and add this override block in from_toml() after routing
is finalized.

In `@src/llm/routing.rs`:
- Around line 36-41: The stt_provider override (stt_provider) currently can
mismatch the model name extracted from voice (e.g., voice = "openai/whisper-1"
with stt_provider = "groq"), so update the code to either validate and reject
mismatched provider/model pairs or to resolve a provider-specific STT model when
an override is provided: add a resolver function (e.g.,
resolve_stt_model(provider: &str, voice: &str) -> Result<String, Error>) and
call it where channel attachments are built (the code that extracts the model
from voice in channel_attachments), or add a validator (e.g.,
validate_stt_provider_model_pair(stt_provider: Option<String>, voice: &str) ->
Result<(), Error>) in the routing/struct initialization so that passing
stt_provider will produce a matching model name for that provider instead of
blindly stripping voice.

In `@src/llm/transcription.rs`:
- Around line 76-88: The build_whisper_endpoint function is appending an extra
"/openai" for Groq hosts, producing duplicate segments; update
build_whisper_endpoint to append only "/v1/{audio/...}" when base contains
"groq.com" (i.e., remove the "/openai" from the Groq branch) so the path becomes
"{base}/v1/{path}", and update the corresponding unit tests that assert Groq
URLs (tests referencing the Groq endpoint expectations) to expect no duplicate
"/openai" in the resulting URL.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9ec65f3f-c2f8-41b9-ba61-b1023371d101

📥 Commits

Reviewing files that changed from the base of the PR and between 81c7855 and 3f76e29.

⛔ Files ignored due to path filters (1)

docs/content/docs/(features)/meta.json is excluded by !**/*.json

📒 Files selected for processing (13)

.gitignore
docs/content/docs/(configuration)/config.mdx
docs/content/docs/(core)/routing.mdx
docs/content/docs/(features)/voice-transcription.mdx
src/agent/channel_attachments.rs
src/api/config.rs
src/api/models.rs
src/config/load.rs
src/config/providers.rs
src/config/toml_schema.rs
src/llm.rs
src/llm/routing.rs
src/llm/transcription.rs

coderabbitai · 2026-03-10T04:28:54Z

.gitignore

+# Rust
+/target
+.Cargo.lock


⚠️ Potential issue | 🟡 Minor

.Cargo.lock won't match Cargo.lock.

Cargo.lock is not a dotfile, so this pattern currently has no effect on the standard Rust lockfile.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.gitignore around lines 13 - 15, The .gitignore currently lists ".Cargo.lock" (a dot-prefixed filename) which won't match the actual Rust lockfile "Cargo.lock"; update the ignore entry to "Cargo.lock" (or add both "Cargo.lock" and ".Cargo.lock" if you want to cover both) so the standard Rust lockfile is properly ignored.

coderabbitai · 2026-03-10T04:28:54Z

src/agent/channel_attachments.rs

+    let provider_id = routing.stt_provider.as_deref().unwrap_or_else(|| {
+        voice_model
+            .split_once('/')
+            .map(|(p, _)| p)
+            .unwrap_or("anthropic")
+    });
+
+    let model_name = voice_model
+        .split_once('/')
+        .map(|(_, m)| m)
+        .unwrap_or(voice_model);


⚠️ Potential issue | 🟠 Major

Reject providerless voice models when stt_provider is unset.

Falling back to "anthropic" here turns voice = "whisper-1" into a misleading "not configured / not supported" path instead of telling the operator the config is incomplete. This should error unless either stt_provider is set or voice is already provider/model.

💡 Suggested fix

- let provider_id = routing.stt_provider.as_deref().unwrap_or_else(|| { - voice_model - .split_once('/') - .map(|(p, _)| p) - .unwrap_or("anthropic") - }); - - let model_name = voice_model - .split_once('/') - .map(|(_, m)| m) - .unwrap_or(voice_model); + let (provider_id, model_name) = match routing.stt_provider.as_deref() { + Some(provider_id) => (provider_id, voice_model), + None => match voice_model.split_once('/') { + Some((provider_id, model_name)) => (provider_id, model_name), + None => { + return UserContent::text( + "[Audio transcription failed: `voice` must be `provider/model` when `stt_provider` is unset]", + ); + } + }, + };

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let provider_id = routing.stt_provider.as_deref().unwrap_or_else(|| {

voice_model

.split_once('/')

.map(|(p, _)| p)

.unwrap_or("anthropic")

});

let model_name = voice_model

.split_once('/')

.map(|(_, m)| m)

.unwrap_or(voice_model);

let (provider_id, model_name) = match routing.stt_provider.as_deref() {

Some(provider_id) => (provider_id, voice_model),

None => match voice_model.split_once('/') {

Some((provider_id, model_name)) => (provider_id, model_name),

None => {

return UserContent::text(

"[Audio transcription failed: `voice` must be `provider/model` when `stt_provider` is unset]",

);

}

},

};

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/agent/channel_attachments.rs` around lines 225 - 235, The current logic silently defaults provider_id to "anthropic" when routing.stt_provider is None, which masks missing configuration; change the handling in the block that computes provider_id and model_name (using routing.stt_provider, voice_model, provider_id, and model_name) to: if voice_model contains a '/' use the split provider and model as now; else if routing.stt_provider is Some use that provider and voice_model as model; otherwise return/raise an error (or propagate a Config/InvalidInput error) indicating that stt_provider is unset and voice must be set as "provider/model". Ensure the error message clearly references the missing stt_provider and the expected "provider/model" format so operators can fix the config.

coderabbitai · 2026-03-10T04:28:54Z

src/agent/channel_attachments.rs

+    if !supports_whisper_transcription(&provider) {
        return UserContent::text(format!(
-            "[Audio transcription failed for {}: provider '{}' does not support input_audio on this endpoint]",
-            attachment.filename, provider_id
+            "[Audio transcription not supported by provider '{}'. \
+             Configure a Whisper-compatible STT provider (openai, groq, gemini).]",
+            provider_id
        ));
    }

-    let format = audio_format_for_attachment(attachment);
-    use base64::Engine as _;
-    let base64_audio = base64::engine::general_purpose::STANDARD.encode(&bytes);
-
-    let endpoint = format!(
-        "{}/v1/chat/completions",
-        provider.base_url.trim_end_matches('/')
-    );
-    let body = serde_json::json!({
-        "model": model_name,
-        "messages": [{
-            "role": "user",
-            "content": [
-                {
-                    "type": "text",
-                    "text": "Transcribe this audio verbatim. Return only the transcription text."
-                },
-                {
-                    "type": "input_audio",
-                    "input_audio": {
-                        "data": base64_audio,
-                        "format": format,
-                    }
-                }
-            ]
-        }],
-        "temperature": 0
-    });
-
-    let response = match deps
-        .llm_manager
-        .http_client()
-        .post(&endpoint)
-        .header("authorization", format!("Bearer {}", provider.api_key))
-        .header("content-type", "application/json")
-        .json(&body)
-        .send()
-        .await
-    {
-        Ok(response) => response,
-        Err(error) => {
-            tracing::warn!(%error, model = %voice_model, "voice transcription request failed");
-            return UserContent::text(format!(
-                "[Audio transcription failed for {}]",
-                attachment.filename
-            ));
-        }
+    let request = TranscriptionRequest {
+        audio_bytes: &bytes,
+        filename: &attachment.filename,
+        mime_type: &attachment.mime_type,
+        model: model_name,
+        language: routing.voice_language.as_deref(),
+        translate: routing.voice_translate,
    };

-    let status = response.status();
-    let response_body = match response.json::<serde_json::Value>().await {
-        Ok(body) => body,
+    match transcribe_audio(deps.llm_manager.http_client(), &provider, request).await {


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Does the Gemini OpenAI-compatible API support POST /v1/audio/translations, or only POST /v1/audio/transcriptions?

💡 Result:

Neither. Google’s Gemini OpenAI-compatible REST base URL (.../v1beta/openai/) does not provide OpenAI’s Whisper-style endpoints like POST /v1/audio/transcriptions or POST /v1/audio/translations. Instead, audio transcription/translation is done by sending audio as input_audio to POST /v1/chat/completions and prompting the model to transcribe/translate. [1]

A third-party compatibility matrix likewise marks Gemini as unsupported for both /audio/transcriptions and /audio/translations. [2]

Sources:
[1] Google AI for Developers — “OpenAI compatibility” (audio example uses /openai/chat/completions) (ai.google.dev)
[2] TrueFoundry AI Gateway docs — supported APIs table (truefoundry.com)

Citations:

1: https://ai.google.dev/gemini-api/docs/openai

2: https://truefoundry.com/docs/ai-gateway

🏁 Script executed:

#!/bin/bash # Find the definition of supports_whisper_transcription function rg -n "fn supports_whisper_transcription" --type rs

Repository: spacedriveapp/spacebot

Length of output: 92

🏁 Script executed:

#!/bin/bash # Search for provider support checks related to transcription rg -n "supports_whisper_transcription|gemini.*transcrib|transcrib.*gemini" --type rs -i

Repository: spacedriveapp/spacebot

Length of output: 92

🏁 Script executed:

#!/bin/bash # Look for the TranscriptionRequest and how translation is handled rg -n "voice_translate|TranscriptionRequest" --type rs -A 5

Repository: spacedriveapp/spacebot

Length of output: 92

🏁 Script executed:

#!/bin/bash # Find the definition of supports_whisper_transcription function rg -n "fn supports_whisper_transcription" src/

Repository: spacedriveapp/spacebot

Length of output: 169

🏁 Script executed:

#!/bin/bash # Search for provider support checks rg -n "supports_whisper_transcription|voice_translate" src/

Repository: spacedriveapp/spacebot

Length of output: 2394

🏁 Script executed:

#!/bin/bash # Look at the channel_attachments.rs file to understand context wc -l src/agent/channel_attachments.rs

Repository: spacedriveapp/spacebot

Length of output: 102

🏁 Script executed:

#!/bin/bash # Read the supports_whisper_transcription implementation sed -n '152,170p' src/llm/transcription.rs

Repository: spacedriveapp/spacebot

Length of output: 633

🏁 Script executed:

#!/bin/bash # Look at the Gemini test to understand what's expected sed -n '299,310p' src/llm/transcription.rs

Repository: spacedriveapp/spacebot

Length of output: 546

🏁 Script executed:

#!/bin/bash # Search for transcribe_audio function definition rg -n "fn transcribe_audio" src/ -A 20

Repository: spacedriveapp/spacebot

Length of output: 2857

🏁 Script executed:

#!/bin/bash # Look at the build_whisper_endpoint function rg -n "fn build_whisper_endpoint" src/ -A 15

Repository: spacedriveapp/spacebot

Length of output: 942

🏁 Script executed:

#!/bin/bash # Get more context on TranscriptionRequest and the full transcribe function sed -n '39,90p' src/llm/transcription.rs

Repository: spacedriveapp/spacebot

Length of output: 1680

🏁 Script executed:

#!/bin/bash # Check if there are any other tests or configurations that show Gemini transcription is actually supported rg -n "gemini.*transcrib|Gemini.*audio" src/ -i -B 2 -A 2

Repository: spacedriveapp/spacebot

Length of output: 472

🏁 Script executed:

#!/bin/bash # Look at the full test file to understand the intent sed -n '273,350p' src/llm/transcription.rs

Repository: spacedriveapp/spacebot

Length of output: 2969

Gemini should not be listed as supporting Whisper-compatible audio endpoints.

The code incorrectly includes Gemini in supports_whisper_transcription() via the URL check for googleapis.com. Gemini's OpenAI-compatible API does not support the Whisper endpoints (POST /v1/audio/transcriptions or POST /v1/audio/translations). Instead, Gemini performs audio transcription via POST /v1/chat/completions with audio as input_audio. Remove Gemini from the supported providers list, or implement proper Gemini-specific audio handling.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/agent/channel_attachments.rs` around lines 248 - 265, The provider list wrongly treats Gemini as Whisper-compatible; update supports_whisper_transcription to exclude Gemini (do not rely on googleapis.com or other URL checks that include Gemini) and either remove Gemini from the branch that builds a TranscriptionRequest or add a separate Gemini-specific audio path; specifically, change supports_whisper_transcription(...) so it returns false for provider identifiers or configs that indicate Gemini, and ensure transcribe_audio(...) is only called for true results (leave TranscriptionRequest creation and the match on transcribe_audio(...) unchanged for non-Gemini providers).

coderabbitai · 2026-03-10T04:28:54Z

src/agent/channel_attachments.rs

+            UserContent::text(format!(
+                "<{} name=\"{}\" mime=\"{}\">\n{}\n</{}>",
+                tag, attachment.filename, attachment.mime_type, response.text, tag
+            ))


⚠️ Potential issue | 🟠 Major

Escape transcript text before injecting XML tags.

response.text, attachment.filename, and attachment.mime_type are all unescaped here. A transcript containing </voice_transcript> or a filename containing quotes will break the wrapper and can inject extra pseudo-tags into history.

🔒 Suggested fix

+ let escaped_filename = escape_xml(&attachment.filename); + let escaped_mime_type = escape_xml(&attachment.mime_type); + let escaped_text = escape_xml(&response.text); UserContent::text(format!( "<{} name=\"{}\" mime=\"{}\">\n{}\n</{}>", - tag, attachment.filename, attachment.mime_type, response.text, tag + tag, escaped_filename, escaped_mime_type, escaped_text, tag ))

Use any existing XML/HTML escaping helper in the codebase, or add a tiny local helper if there isn't one yet.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/agent/channel_attachments.rs` around lines 272 - 275, Escape the values injected into the XML wrapper to prevent tag/attribute injection: when constructing the UserContent::text in the block that formats "<{} name=\"{}\" mime=\"{}\">...\n</{}>" (the code referencing tag, attachment.filename, attachment.mime_type, response.text), run attachment.filename and attachment.mime_type through an attribute-escaping helper and run response.text through an element/inner-text escaping helper (or add a small local xml_escape function if none exists) before formatting; ensure the same escaped values are used for both the start and end tag context where applicable to avoid breaking the wrapper.

coderabbitai · 2026-03-10T04:28:54Z

src/api/config.rs

+    voice_language: Option<String>,
+    voice_translate: Option<bool>,
+    stt_provider: Option<String>,


⚠️ Potential issue | 🟠 Major

PATCH can't unset voice_language or stt_provider.

GET /api/config exposes these as nullable fields, but this PATCH shape can't round-trip null: Option<String> makes omitted and explicit null identical, and update_routing_table() only writes on Some(...). After a client sets either field once, it has no way to remove the key and fall back to inherited/default routing. Please use a tri-state patch shape and delete the TOML key on explicit null.

Also applies to: 595-603

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/api/config.rs` around lines 172 - 174, The patch struct fields voice_language, voice_translate, and stt_provider must be made tri-state so we can distinguish omitted vs explicit null: change their types from Option<String>/Option<bool> to Option<Option<String>> and Option<Option<bool>> in the PATCH struct (the struct declared around the shown fields, e.g., the config patch type), then update update_routing_table() to handle three cases: None => do nothing (field omitted), Some(Some(value)) => write the TOML key with value, and Some(None) => delete the corresponding TOML key (remove it so routing falls back to inherited/default). Apply the same tri-state change and handling for the other occurrence referenced (around lines 595-603).

coderabbitai · 2026-03-10T04:28:54Z

src/api/models.rs

+/// Returns true if the provider supports Whisper-compatible voice transcription.
+fn supports_voice_transcription(provider: &str) -> bool {
+    WHISPER_CAPABLE_PROVIDERS.contains(&provider)


⚠️ Potential issue | 🟠 Major

Don't expose whole providers as voice_transcription models.

This turns provider support into model support, so /api/models?capability=voice_transcription now returns the whole OpenAI/Groq/Gemini catalog even though the transcription path still sends the chosen model id straight to /audio/transcriptions. Normal chat models like openai/gpt-4.1 or groq/llama-3.3-70b-versatile become selectable here and then fail at runtime. If this endpoint is meant to drive voice-model selection, it needs a model-level allowlist/signal rather than a provider-level check.

Also applies to: 360-365, 407-410

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/api/models.rs` around lines 108 - 110, The current check (supports_voice_transcription using WHISPER_CAPABLE_PROVIDERS) treats provider support as model support and exposes whole provider catalogs; replace it with a model-level allowlist or capability flag. Add a new predicate (e.g., supports_voice_transcription_model(model_id: &str) -> bool) that checks either a WHISPER_COMPATIBLE_MODELS set or a per-model metadata flag, update all usages of supports_voice_transcription and WHISPER_CAPABLE_PROVIDERS to call supports_voice_transcription_model (including the other occurrences mentioned), and ensure the models endpoint filters by model id/capability rather than by provider so only actual Whisper-compatible models are returned for voice_transcription.

coderabbitai · 2026-03-10T04:28:54Z

src/config/load.rs

+        if let Ok(voice_language) = std::env::var("SPACEBOT_VOICE_LANGUAGE") {
+            routing.voice_language = Some(voice_language);
+        }
+        if let Ok(voice_translate) = std::env::var("SPACEBOT_VOICE_TRANSLATE") {
+            if voice_translate.eq_ignore_ascii_case("true") {
+                routing.voice_translate = true;
+            }
+        }
+        if let Ok(stt_provider) = std::env::var("SPACEBOT_STT_PROVIDER") {
+            routing.stt_provider = Some(stt_provider);
+        }


⚠️ Potential issue | 🟠 Major

Apply these voice/STT env overrides in the TOML load path too.

These vars are only handled in load_from_env(). As soon as config.toml exists, load_from_path() goes through from_toml() and SPACEBOT_VOICE_LANGUAGE, SPACEBOT_VOICE_TRANSLATE, and SPACEBOT_STT_PROVIDER stop overriding config. SPACEBOT_VOICE_TRANSLATE is also one-way here, so env cannot force false over a TOML true.

💡 Suggested direction

+fn apply_voice_env_overrides(routing: &mut RoutingConfig) { + if let Ok(voice_language) = std::env::var("SPACEBOT_VOICE_LANGUAGE") { + routing.voice_language = (!voice_language.trim().is_empty()).then_some(voice_language); + } + if let Ok(voice_translate) = std::env::var("SPACEBOT_VOICE_TRANSLATE") { + match voice_translate.trim().to_ascii_lowercase().as_str() { + "true" => routing.voice_translate = true, + "false" => routing.voice_translate = false, + _ => tracing::warn!(value = %voice_translate, "invalid SPACEBOT_VOICE_TRANSLATE"), + } + } + if let Ok(stt_provider) = std::env::var("SPACEBOT_STT_PROVIDER") { + routing.stt_provider = (!stt_provider.trim().is_empty()).then_some(stt_provider); + } +} ... - if let Ok(voice_language) = std::env::var("SPACEBOT_VOICE_LANGUAGE") { - routing.voice_language = Some(voice_language); - } - if let Ok(voice_translate) = std::env::var("SPACEBOT_VOICE_TRANSLATE") { - if voice_translate.eq_ignore_ascii_case("true") { - routing.voice_translate = true; - } - } - if let Ok(stt_provider) = std::env::var("SPACEBOT_STT_PROVIDER") { - routing.stt_provider = Some(stt_provider); - } + apply_voice_env_overrides(&mut routing);

Apply the same helper after routing resolution in from_toml() as well, otherwise per-agent TOML values will still win over the env var.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if let Ok(voice_language) = std::env::var("SPACEBOT_VOICE_LANGUAGE") {

routing.voice_language = Some(voice_language);

}

if let Ok(voice_translate) = std::env::var("SPACEBOT_VOICE_TRANSLATE") {

if voice_translate.eq_ignore_ascii_case("true") {

routing.voice_translate = true;

}

}

if let Ok(stt_provider) = std::env::var("SPACEBOT_STT_PROVIDER") {

routing.stt_provider = Some(stt_provider);

}

fn apply_voice_env_overrides(routing: &mut RoutingConfig) {

if let Ok(voice_language) = std::env::var("SPACEBOT_VOICE_LANGUAGE") {

routing.voice_language = (!voice_language.trim().is_empty()).then_some(voice_language);

}

if let Ok(voice_translate) = std::env::var("SPACEBOT_VOICE_TRANSLATE") {

match voice_translate.trim().to_ascii_lowercase().as_str() {

"true" => routing.voice_translate = true,

"false" => routing.voice_translate = false,

_ => tracing::warn!(value = %voice_translate, "invalid SPACEBOT_VOICE_TRANSLATE"),

}

}

if let Ok(stt_provider) = std::env::var("SPACEBOT_STT_PROVIDER") {

routing.stt_provider = (!stt_provider.trim().is_empty()).then_some(stt_provider);

}

}

// ... in the original location (lines 790-800):

apply_voice_env_overrides(&mut routing);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/config/load.rs` around lines 790 - 800, The TOML load path (from_toml()/load_from_path()) doesn't apply the SPACEBOT_VOICE_LANGUAGE, SPACEBOT_VOICE_TRANSLATE, and SPACEBOT_STT_PROVIDER environment overrides so TOML values can shadow env vars and SPACEBOT_VOICE_TRANSLATE cannot force false; after routing is built in from_toml() (i.e., after routing resolution) apply the same env-override logic used in load_from_env(): if SPACEBOT_VOICE_LANGUAGE is set, assign routing.voice_language = Some(value); if SPACEBOT_STT_PROVIDER is set, assign routing.stt_provider = Some(value); and for SPACEBOT_VOICE_TRANSLATE explicitly parse the env var for "true"/"false" (case-insensitive) and set routing.voice_translate accordingly so env can force both true and false. Ensure you use the same symbol names (routing.voice_language, routing.voice_translate, routing.stt_provider) and add this override block in from_toml() after routing is finalized.

coderabbitai · 2026-03-10T04:28:54Z

src/llm/routing.rs

+    /// Language hint for voice transcription accuracy (e.g., "en", "es").
+    pub voice_language: Option<String>,
+    /// If true, use translations endpoint (translates to English).
+    pub voice_translate: bool,
+    /// Optional STT provider override (defaults to voice model provider).
+    pub stt_provider: Option<String>,


⚠️ Potential issue | 🟠 Major

stt_provider needs a matching model contract.

The attachment path in src/agent/channel_attachments.rs (Lines 225-235) takes the provider from stt_provider, but it still strips the model name out of voice. That means voice = "openai/whisper-1" plus stt_provider = "groq" will send whisper-1 to Groq, and the inverse fails the other way around. Either reject mismatched pairs or resolve a provider-specific STT model when the override is used.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/llm/routing.rs` around lines 36 - 41, The stt_provider override (stt_provider) currently can mismatch the model name extracted from voice (e.g., voice = "openai/whisper-1" with stt_provider = "groq"), so update the code to either validate and reject mismatched provider/model pairs or to resolve a provider-specific STT model when an override is provided: add a resolver function (e.g., resolve_stt_model(provider: &str, voice: &str) -> Result<String, Error>) and call it where channel attachments are built (the code that extracts the model from voice in channel_attachments), or add a validator (e.g., validate_stt_provider_model_pair(stt_provider: Option<String>, voice: &str) -> Result<(), Error>) in the routing/struct initialization so that passing stt_provider will produce a matching model name for that provider instead of blindly stripping voice.

coderabbitai · 2026-03-10T04:28:54Z

src/llm/transcription.rs

+fn build_whisper_endpoint(base_url: &str, translate: bool) -> String {
+    let base = base_url.trim_end_matches('/');
+    let path = if translate {
+        "audio/translations"
+    } else {
+        "audio/transcriptions"
+    };
+
+    if base.contains("groq.com") {
+        format!("{}/openai/v1/{}", base, path)
+    } else {
+        format!("{}/v1/{}", base, path)
+    }


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

For Groq's OpenAI-compatible API, if the base URL is https://api.groq.com/openai`, what is the full audio transcription endpoint path? Should client code append /v1/audio/transcriptions or /openai/v1/audio/transcriptions?`

💡 Result:

Groq’s OpenAI-compatible audio transcription endpoint is:

Full URL: https://api.groq.com/openai/v1/audio/transcriptions [1][2]

So if your base URL is https://api.groq.com/openai, your client should append:

/v1/audio/transcriptions (resulting in https://api.groq.com/openai/v1/audio/transcriptions) [1][2]

Do not append /openai/v1/audio/transcriptions (that would duplicate the /openai segment). [1]

Sources:
[1] GroqDocs – OpenAI Compatibility (base_url uses /openai/v1)
[2] GroqDocs – Speech to Text (transcriptions endpoint shown as /openai/v1/audio/transcriptions)

Remove the duplicate /openai segment in the Groq endpoint builder.

Since src/config/providers.rs already sets Groq's base URL to https://api.groq.com/openai, appending /openai/v1/audio/transcriptions creates an invalid URL: https://api.groq.com/openai/openai/v1/audio/.... The correct endpoint requires only /v1/audio/transcriptions to be appended. Update lines 76-88 and the corresponding tests on lines 192-207 to remove the /openai from the path construction when handling Groq.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/llm/transcription.rs` around lines 76 - 88, The build_whisper_endpoint function is appending an extra "/openai" for Groq hosts, producing duplicate segments; update build_whisper_endpoint to append only "/v1/{audio/...}" when base contains "groq.com" (i.e., remove the "/openai" from the Groq branch) so the path becomes "{base}/v1/{path}", and update the corresponding unit tests that assert Groq URLs (tests referencing the Groq endpoint expectations) to expect no duplicate "/openai" in the resulting URL.

…tion

…tion Co-authored-by: KiloCodium <KiloCoder@neurocis.ai>

…anscription

neurocis added 4 commits March 9, 2026 17:15

Merge branch 'main' into feat/transcription

7084576

fix: convert reqwest error to LlmError in transcription

fbb1ca6

The send() method returns reqwest::Error which doesn't have a From implementation for our Error type. Map it to LlmError::ProviderRequest.

Merge remote-tracking branch 'spacedriveapp/main' into feat/transcrip…

96759b2

…tion

docs: add voice transcription documentation

3f76e29

Co-authored-by: KiloCodium <KiloCoder@neurocis.ai>

neurocis force-pushed the feat/transcription branch from e37dc26 to 3f76e29 Compare March 10, 2026 04:13

neurocis marked this pull request as ready for review March 10, 2026 04:14

tembo bot reviewed Mar 10, 2026

View reviewed changes

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

neurocis and others added 4 commits March 9, 2026 22:34

Merge remote-tracking branch 'spacedriveapp/main' into feat/transcrip…

01e50d8

…tion

Merge remote-tracking branch 'spacedriveapp/main' into feat/transcrip…

e032792

…tion

Merge remote-tracking branch 'spacedriveapp/main' into feat/transcrip…

74278ce

…tion Co-authored-by: KiloCodium <KiloCoder@neurocis.ai>

Merge remote-tracking branch 'origin/feat/transcription' into feat/tr…

46616fd

…anscription

-            UserContent::text(format!(
+            let escape_attr = |s: &str| {
+                s.replace('&', "&amp;")
+                    .replace('<', "&lt;")
+                    .replace('>', "&gt;")
+                    .replace('"', "&quot;")
+            };
+            let escape_text = |s: &str| {
+                s.replace('&', "&amp;")
+                    .replace('<', "&lt;")
+                    .replace('>', "&gt;")
+            };
+            let filename = escape_attr(&attachment.filename);
+            let mime_type = escape_attr(&attachment.mime_type);
+            let text = escape_text(&response.text);
+            UserContent::text(format!(
+                "<{} name=\"{}\" mime=\"{}\">\n{}\n</{}>",
+                tag, filename, mime_type, text, tag
+            ))

Conversation

neurocis commented Mar 10, 2026 • edited by tembo bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation Summary

Uh oh!

coderabbitai bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

tembo bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

tembo bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

tembo bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

tembo bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

neurocis commented Mar 10, 2026 •

edited by tembo bot

Loading

coderabbitai bot commented Mar 10, 2026 •

edited

Loading