Release to upload files

Upload Audio for Speech Recognition

Drag an audio file here or click to select
(max. 100 MB, supported: .mp3, .wav, .ogg, etc.)
Total upload progress:

About Our Speech Recognition Service

How It Works

On this page, you can upload an audio file (MP3, WAV, OGG, M4A, etc.) and quickly get a text transcription. Simply drag your file into the area above or click it to select from your computer. Maximum file size is 100 MB.

Why Converting.cloud

  • We use OpenAI Whisper — a state-of-the-art speech recognition technology with up to 98% accuracy even in noisy environments.
  • Support for over 30 languages: English, Russian, Spanish, French, German, Chinese, Japanese, Italian, Portuguese, Arabic, and many more.
  • Automatic language detection: if you're unsure which language your recording is in, choose "Auto" and the system will detect it for you.

Available Models

We offer several Whisper models with different capabilities:

Model Accuracy Processing Speed Best For
tiny Basic ~12x faster than real-time Quick drafts, clear speech
base Good ~8x faster than real-time Simple audio, minimal background noise
small Very Good ~4x faster than real-time Standard transcription needs (recommended)
medium Excellent ~2x faster than real-time Complex audio, multiple speakers
large Superior ~1x real-time Challenging audio, accents, background noise

Language Selection

  • Auto mode analyzes the first 30 seconds of audio to guess the language. Accuracy is excellent.
  • Choosing a language yourself bypasses detection and speeds up transcription slightly.

Processing Time

The processing time depends on the model size and audio duration:

  • tiny model: ~5 seconds for 1 minute of audio
  • base model: ~8 seconds for 1 minute of audio
  • small model: ~15 seconds for 1 minute of audio
  • medium model: ~30 seconds for 1 minute of audio
  • large model: ~60 seconds for 1 minute of audio

For example, a 10-minute recording using the small model would take approximately 2.5 minutes to process.

Processing Queue

All files are processed in a queue system. The prioritization works as follows:

  1. Files are first prioritized by model size (tiny → base → small → medium → large)
  2. Within each model group, shorter audio files are processed before longer ones

This means if you select a smaller model, your file will be processed faster, not only because the model itself is faster but also because it will be placed higher in the processing queue.

How to Use the Service

Once your file is uploaded, you'll see its name, size, and duration. Select a Whisper model size (from tiny to large) — larger models yield higher accuracy but take longer to process. Click Transcribe, and in a few minutes a download link for your TXT file will appear in the last column of the table.

All uploaded files are temporarily stored on our servers and deleted after 24 hours.