WhatsApp Voice to Text: Turn Voice Notes into Readable Text

The fastest way to do WhatsApp voice to text conversion at scale is to export the entire chat as a ZIP with "Including Media" and process it in one batch. Upload the ZIP file to chattopdf.app and choose the $49 Premium+Voice tier at checkout. Every voice note in the conversation becomes an inline transcript in the PDF — attributed to the sender, placed at the correct timestamp, readable alongside the surrounding text messages.

WhatsApp voice note bubble with arrow to inline text transcript in a PDF, showing sender name and timestamp

How to convert WhatsApp voice notes to text

The entire workflow runs in five steps, from opening WhatsApp to receiving a PDF with every voice note transcribed.

  1. Export the chat with Including Media

    Open WhatsApp and navigate to the chat that contains the voice notes. On iPhone: tap the contact or group name at the top of the chat → scroll to the bottom of the info screen → tap Export Chat → pick Including Media. On Android: tap the three-dot menu in the top-right corner → More → Export Chat → Include Media. The "Including Media" option is the critical one — it puts the .opus voice note files inside the ZIP alongside the text. Without it, the ZIP contains only the _chat.txt file and there is nothing to transcribe. WhatsApp's official export chat FAQ explains the export sheet in detail.

  2. WhatsApp export settings screen highlighting the Including Media option required to include voice note audio files in the ZIP
  3. Save the ZIP to your device

    After selecting Including Media, WhatsApp generates a ZIP file and opens the share sheet. On iPhone, tap Save to Files and choose a folder (On My iPhone → Downloads is reliable). On Android, share the ZIP to your file manager, Google Drive, or email it to yourself. The ZIP is now a real file on your device, ready to upload.

  4. Upload the ZIP to chattopdf.app

    Open chattopdf.app in a browser on your phone or desktop. Drag and drop the ZIP onto the upload area, or tap the area to open a file picker and select the ZIP you just saved. The uploader validates the file and shows a preview: total messages, number of voice notes detected, and an estimated output size. This preview is free — no payment yet.

  5. Upload preview panel showing WhatsApp voice notes detected count and Premium+Voice price before payment
  6. Select the Premium+Voice tier ($49 per chat)

    The pricing step shows five tiers. Pick Premium+Voice at $49 per chat. This tier enables Deepgram Nova-3 transcription on every voice note in the ZIP. The other tiers ($7, $14, $29) either omit voice notes entirely or include them only as audio-file placeholders rather than transcribed text. The $49 tier is the one that converts voice to readable text. Pay by card — one-time, no subscription, no recurring charge.

  7. Download the PDF with inline transcripts

    After payment, chattopdf processes the chat and emails the PDF to the address you provided. Typical processing time for a chat with up to a few dozen voice notes is under two minutes. The PDF arrives with every voice note transcribed at its correct position in the conversation — sender name, timestamp, and transcript text all preserved. Open it, search it, print it, or file it.

Five-step workflow: WhatsApp export with media, ZIP upload, voice note preview, Premium+Voice tier, PDF download

What the output looks like

The key thing about the chattopdf output is that the voice note transcripts are inline in the conversation — not in a separate appendix, not in a separate file, not numbered separately. They appear exactly where the voice notes appeared in the chat, in chronological order, alongside the text messages.

Here is what a section of the resulting PDF looks like with a mixed text-and-voice conversation:


Emma · 09:14
Hey, are you joining the call at 10?

Luca · 09:15 · [voice note — 0:12 — transcript]
"Yeah, I'll be there. Just finishing up the slides. Five more minutes."

Emma · 09:16
Perfect, no rush.

Luca · 09:22 · [voice note — 0:28 — transcript]
"Actually, can we push it to 10:15? I need to send something to the client first and I want to attach the updated version."

Emma · 09:23
Sure, I'll let the others know.


WhatsApp voice to text PDF sample with three voice notes transcribed inline showing sender name and timestamp

Each transcript line carries the sender name and the timestamp from the original chat. If a voice note is in a language other than English — Spanish, Portuguese, Hindi, Arabic, French, and others — it is transcribed in that language. The transcript text is the spoken content as written text. You can select it, copy it, and search the PDF for keywords that appeared in any voice note.

The voice notes that were sound-only in WhatsApp become readable, searchable, archivable text. The surrounding context — the text messages before and after each voice note — remains intact. The full conversation reads as a single document.

When you would want this

Legal and evidence use cases. If a WhatsApp conversation is relevant to a dispute, a complaint, or a court filing, having the voice notes transcribed as part of the same document that shows the text messages is useful. A judge, solicitor, or HR officer can read the entire exchange — including what was said verbally — without needing to play audio. For more on formatting exported chats for legal purposes, the WhatsApp to PDF guide covers the formal styling options.

Business records and compliance. Sales teams, support teams, and freelancers who conduct substantive conversations over WhatsApp — approvals, agreements, instructions, complaints — often need a readable record of those conversations. Voice notes in a business context frequently contain the actual decision or the actual instruction. Having those as readable text alongside the surrounding messages means the record is complete and searchable, not partially mute.

Personal and family archives. Long family group chats accumulate years of voice notes from relatives — updates, stories, birthday wishes, directions, check-ins. Many of those voice notes are from people whose voices you want to remember. Transcribing them into a PDF creates a readable archive that can be searched, printed, and kept alongside photos and written messages. The voice notes do not stay as audio forever — device storage gets cleared, WhatsApp accounts get deleted, phones change hands. A PDF with the transcripts is a more durable format.

Accessibility. Voice notes are by nature inaccessible to people who are deaf or hard of hearing, or to people who cannot play audio in the environment they are in (commuting, an open-plan office, a meeting). A transcript makes the content of a voice note available regardless of hearing ability or audio environment.

In all of these cases, the value of the chattopdf approach compared to a standalone transcription app is that the transcript lands in context — at the right position in the conversation, attributed to the right speaker, readable alongside the surrounding text. You do not have to manually correlate a list of transcripts against a separate exported chat.

Four use cases for WhatsApp voice to text conversion: legal evidence, business records, personal archive, and accessibility

Accuracy and languages

The voice transcription uses Deepgram Nova-3, which in my own testing handles clear recordings in supported languages well. In quiet conditions — indoors, phone held near the mouth, no background noise — the transcripts read cleanly with very few errors. In noisy conditions — a moving car, a café, wind — accuracy degrades noticeably and the occasional word or phrase needs a second read. The transcribe WhatsApp audio guide covers the accuracy picture in detail, including how background noise affects results and what the word error rate looks like across recording conditions.

WhatsApp voice to text language support: 17 high-accuracy languages including English, Spanish, Arabic and Hindi

The 17 languages where Deepgram Nova-3 performs at high accuracy include English, Spanish, Portuguese, French, German, Italian, Arabic, Hindi, Indonesian, Turkish, Russian, Dutch, Japanese, Korean, Mandarin, Vietnamese, and Thai. If a voice note is in any of these languages, the transcript is typically clean enough to be useful without editing. For languages beyond this set, the engine falls back to a broader detection model — accuracy varies and some languages may not transcribe usefully. If your chat contains voice notes in a language that is not on that list, I would suggest testing with a single short export before relying on the transcription for anything important.

WhatsApp voice notes are encoded as Opus audio at 16 kHz. That is a narrow frequency range optimised for speech intelligibility, which means the transcription engine receives all the information it needs for speech recognition. The audio format itself is not the limiting factor. The WhatsApp audio format guide explains the Opus codec in more detail if you are curious about the technical side.

For language detection to work correctly, the ZIP must include the voice note files — which is why the "Including Media" export option matters. Without the audio files in the ZIP, there is nothing to transcribe regardless of tier.

Key takeaways

  • To get WhatsApp voice notes as readable text, export the chat with Including Media — this puts the .opus audio files inside the ZIP, which is required for transcription.
  • Upload the ZIP to chattopdf.app and choose the $49 Premium+Voice per chat tier. The other tiers do not transcribe voice notes to text.
  • The output is a PDF where every voice note appears as an inline transcript at its position in the conversation, with sender name and timestamp — not as a separate list or file.
  • 17 languages at high accuracy, 30+ auto-detected. Background noise is the main accuracy factor, not the language or the Opus audio format.
  • Free alternatives exist (WhatsApp's in-app transcription, Otter.ai, Google Recorder) but require processing voice notes one at a time; ChatToPDF batches the whole chat in a single pass.
  • Voice calls are not in the chat export and cannot be transcribed. Only voice note messages (the audio clips sent in the chat) are included.
  • The transcribe WhatsApp audio pillar covers the full technical detail — accuracy by noise level, the Deepgram Nova-3 pipeline, and what the WER looks like across recording conditions.

FAQ

Is there a free way to turn WhatsApp voice notes into text?

Yes — several free options exist. WhatsApp itself has a built-in live transcription feature on some Android versions and iPhones running iOS 17 or later: hold the microphone icon in the voice note player and the app shows a transcript. This works in-app for individual messages and is free. Standalone transcription apps like Otter.ai (limited free tier), Google's Recorder app (Pixel-only, free), and Apple's Voice Memos transcription (US English only, iOS) can also transcribe audio files. The limitation with all of these approaches is that you have to transcribe one voice note at a time, manually, and then correlate the output with the surrounding chat text yourself. ChatToPDF charges $49 for the Premium+Voice tier because it processes the entire chat — potentially dozens of voice notes — in a single batch and delivers the transcripts inline at the correct position in the conversation, attributed to the correct sender, as a single searchable PDF. Whether that is worth it depends on how many voice notes you have and what you need the output for.

Does this work on both iPhone and Android exports?

Yes. The export workflow is slightly different on each platform — on iPhone you reach the Export Chat option via the contact info screen (tap the name at the top of the chat); on Android it is under the three-dot menu → More → Export Chat. Both platforms produce a ZIP file containing a _chat.txt and the voice note .opus files when you export with Including Media. ChatToPDF accepts ZIPs from both platforms and processes them identically. The underlying voice note format is the same on both (Opus codec, .opus extension), so there is no quality or accuracy difference based on which phone the export came from.

What languages are supported for voice note transcription?

17 languages at high accuracy: English, Spanish, Portuguese, French, German, Italian, Arabic, Hindi, Indonesian, Turkish, Russian, Dutch, Japanese, Korean, Mandarin, Vietnamese, and Thai. Beyond these 17, Deepgram Nova-3 auto-detects and attempts transcription for 30+ additional languages, though accuracy varies and some may not produce useful output. Mixed-language voice notes — Hinglish (Hindi + English), code-switching in Arabic dialects, Portuguese–Spanish mixing — are handled in a single pass by Nova-3. The transcript is in the language(s) spoken; there is no translation step. For the full language table with notes on dialect handling, see the transcribe WhatsApp audio pillar.

Do WhatsApp voice calls get transcribed too?

No. WhatsApp voice calls are not recorded or stored in the chat export. The ZIP file that WhatsApp generates contains only messages that appeared in the chat window — voice notes (the short recorded audio clips you send as messages), photos, documents, and the text conversation. A phone call or video call made through WhatsApp does not appear in the chat export because it was a live call, not a stored message. Only voice notes — the microphone icon messages in the conversation — are transcribed.

How long does the transcription take?

For a typical chat with up to a few dozen voice notes, the full process — upload, transcription, PDF generation, email delivery — usually completes in under two minutes after payment. Larger chats with many voice notes or unusually long individual recordings (several minutes each) take longer; in my experience a chat with 50+ voice notes totalling an hour of audio can take five to ten minutes. There is no hard limit on audio duration at the Premium+Voice tier (up to 8 hours of audio per chat). You receive the PDF by email once it is ready, so you do not have to stay on the page.

Paul, founder of ChatToPDF
Paul · ChatToPDF

I'm Paul. I built ChatToPDF after watching a friend try to print a 4-year-old WhatsApp chat across forty-something one-page PDFs. I write here about exporting WhatsApp chats, converting them to PDF, transcribing voice notes, and the messy edge cases nobody else writes about (40,000-message export limits, broken emojis, RTL Arabic, Samsung Secure Folder).

Published 2026-05-21