Caspian Office

Speech to text

Turn speech into text in your browser — privately on-device with Whisper (record or upload audio), or instant live dictation with your browser’s built-in recogniser.

Open Speech to text →

Private · runs in your browserOffline · after first loadFree · no signup

What is Speech to text?

A tool that turns speech into text in your browser. Choose the on-device Whisper engine to transcribe a recording or an uploaded audio file privately — the audio never leaves your device — or the browser's built-in live dictation for instant speech typing as you talk. It's useful for transcribing voice notes, interviews and meetings, or dictating text hands-free.

How to use Speech to text

Pick an engine — Choose On-device AI (Whisper) to transcribe recorded or uploaded audio privately and offline, or Browser live dictation for instant mic-only typing.
Set the language and output — For Whisper, pick a model, the language (or Auto-detect), whether to transcribe or translate to English, and optionally turn on timestamps. For live dictation, just choose your spoken language.
Add your audio — With Whisper, press Record to capture from your mic or Choose audio file to upload a recording. With live dictation, press Start dictation and begin speaking.
Transcribe — Press Transcribe to run Whisper over the audio; dictation appears live as you talk.
Copy or save — Use the header actions to copy the transcript or download it as a .txt file.

Frequently asked questions

Is my audio uploaded to a server?

With the Whisper engine, no — your audio is transcribed on your device and never uploaded. Browser live dictation uses your browser's built-in recogniser, which may send audio to your browser maker's service, so it's less private; this is disclosed in the tool.

Does it work offline?

Whisper works offline once the model has been fetched. The model weights download once on first use and are then cached for later transcriptions. Live dictation needs an internet connection.

Can I transcribe an audio file I already have?

Yes, with the Whisper engine — choose an audio file to upload it. Live dictation is mic-only and can't transcribe an uploaded file.

Which Whisper model should I choose?

Base is the recommended default and more accurate; Tiny is faster and a smaller download but less accurate. Larger models take longer on long recordings.

Can it transcribe other languages or translate?

Yes. Set the language (or leave it on Auto-detect), and choose Translate to English if you want the output in English instead of the spoken language.

Why does it say it can't run from a downloaded copy?

Speech to text needs the online app at caspianoffice.io or the installed app, because it loads the AI model and recogniser — it can't run from the offline single-file copy.

Tips

On first use, let the Whisper model finish downloading once; after that, transcriptions work offline.
Turn on timestamps in Whisper when you need to locate spots in a longer recording.
Use live dictation for quick notes while talking, and Whisper for accurate transcripts of recordings.
Record in a quiet room and speak clearly for the most accurate results.

Related tools

← Browse all Caspian Office tools