Export Indian Language Text to PDF, Word, and Plain Text
How document export works for Indian language text from TranslitHub — font embedding, formatting preservation, PDF generation, Word compatibility, and batch export workflows.
The most frustrating moment when working with Indian language text is finishing a document and then watching it break the moment you try to share it. The Devanagari renders as boxes. The Tamil becomes question marks. The carefully laid-out Gujarati letter collapses into unreadable gibberish when the recipient opens it on a machine without the right font installed.
Document export from TranslitHub is specifically designed to avoid these problems. This guide explains how each export format handles Indian scripts, what gets preserved and what doesn't, and how to choose the right format for different use cases.
Why Indian Language Export Is Harder Than Latin Script
With English text, you export a Word document and it looks the same everywhere because Times New Roman and Arial are embedded in every OS since the early 1990s. With Indian language text, the situation is more complicated:
Script complexity: Indic scripts use combining characters — vowel diacritics (matras) that attach to consonants, half-forms, conjuncts (two or more consonants merged into a single glyph). Rendering these correctly requires a font that explicitly includes all glyph variants plus an OpenType shaping engine to select the right variant in context. Font availability: Not every computer has Mangal, Latha, Noto Sans Devanagari, or the other fonts needed for Indian scripts. A document that looks fine on the computer it was created on can be unreadable on another machine. Encoding history: Indian language content has a complicated history of non-Unicode encodings (ISCII, ASCII art fonts like Kruti Dev, Shivaji, etc.) where different fonts produced different characters from the same code points. Unicode fixed this, but legacy documents still exist, and if your tool exports in a legacy encoding, recipients may see garbage.TranslitHub exports exclusively in Unicode (UTF-8/UTF-16), and the PDF export embeds fonts. This combination solves all three problems.
Exporting to PDF
PDF is the safest format for Indian language documents that need to look consistent everywhere. When TranslitHub exports a PDF, it:
- Renders the document at screen quality using the web font loaded in the editor
- Embeds a subset of that font (only the glyphs actually used in your document) into the PDF file
- Produces a file that renders identically on macOS, Windows, Linux, iOS, Android, and any standard PDF viewer — regardless of what fonts are installed
What Gets Preserved in PDF
| Element | Preserved |
|---|---|
| Bold, italic, underline | Yes |
| Headings (H1, H2, H3) | Yes — with size and weight differences |
| Bullet and numbered lists | Yes |
| Text alignment (including justified) | Yes |
| Font size | Yes |
| Script rendering (conjuncts, matras) | Yes |
| Hyperlinks | Yes — clickable in PDF viewers |
| Page margins | Set to A4 standard (adjustable) |
PDF Page Sizes
The PDF export defaults to A4 (210×297mm) with 25mm margins — standard for formal documents in India. You can change this to:
- Letter (US standard, for international recipients)
- Legal (for court filings and legal documents)
- A5 (half-A4, useful for booklets)
For official government correspondence, stick with A4. For affidavits and court documents, check the specific court's requirements — some require specific font sizes and margins.
Practical Scenario: A Formal Letter in Marathi
A school principal needs to send a circular to parents in Marathi. She:
- Types the letter in the TranslitHub editor using Marathi (phonetic input)
- Uses justified alignment (standard for formal Marathi correspondence)
- Sets font size to 12pt with 1.5 line spacing for readability
- Exports to PDF
The resulting PDF can be emailed to parents, printed from any computer, or uploaded to the school's website — it looks identical in every context.
Exporting to DOCX (Word)
Word export creates a .docx file compatible with Microsoft Word, Google Docs, LibreOffice Writer, and WPS Office. It's the right format when you need to:
- Send a document for review or editing by someone else
- Continue working on the document in Word later
- Submit to a publisher or platform that requires
.docx - Collaborate with others who don't use TranslitHub
Font Selection for DOCX
Because DOCX files don't embed fonts (they reference fonts by name), the document depends on the recipient having a compatible font installed. TranslitHub DOCX exports use these fonts:
| Script | Font Used in DOCX |
|---|---|
| Devanagari (Hindi, Marathi, Sanskrit) | Mangal |
| Bengali | Vrinda |
| Tamil | Latha |
| Telugu | Gautami |
| Kannada | Tunga |
| Malayalam | Kartika |
| Gujarati | Shruti |
| Gurmukhi (Punjabi) | Raavi |
| Odia | Kalinga |
If you need guaranteed visual consistency for a DOCX file, you have two options:
- Tell recipients to install the Noto family of fonts (free, excellent Indian language support)
- Export to PDF instead
What Gets Preserved in DOCX
| Element | Preserved |
|---|---|
| Bold, italic, underline | Yes |
| Headings | Yes — mapped to Word heading styles |
| Lists | Yes |
| Text alignment | Yes |
| Font size | Yes |
| Script rendering | Depends on recipient's font |
| Editable text | Yes — fully editable in Word |
Importing Back After Editing
If a colleague edits your DOCX in Microsoft Word and sends it back, you can open it in TranslitHub's editor (use File → Import). The Unicode text comes back correctly. Formatting may differ slightly depending on what was changed in Word, but the actual Indian language characters are preserved.
Exporting to Plain Text (TXT)
TXT export strips all formatting and gives you raw Unicode text in UTF-8 encoding. This is appropriate when:
- Submitting content to a CMS that handles its own formatting
- Providing data to a developer or database
- Copying into apps that don't support rich text
- Using the text in Python/JavaScript scripts
UTF-8 vs UTF-16: Which One?
TranslitHub's default TXT export is UTF-8, which is the right choice for nearly everything — web content, databases, APIs, email, SMS. UTF-16 is available for legacy systems that specifically require it (some older Windows applications), but if you don't know what your recipient needs, use UTF-8.
Line Endings
Windows apps expect CRLF (\r\n) line endings. Linux and macOS use LF (\n) only. By default, the TXT export uses LF (Unix style), which is readable on all platforms. If a Windows program displays everything on one line, use the "Windows line endings" option in the export dialog.
Batch Export
For content creators, publishers, or educators who work with multiple documents, batch export lets you export several documents at once.
How Batch Export Works
- Open multiple documents in the editor (tabs)
- Go to File → Export All
- Choose format (PDF, DOCX, or TXT)
- TranslitHub generates all files and downloads them as a ZIP archive
Naming Convention
Files in the batch export are named according to the first line of each document (or the document title if you've set one). Special characters are removed from filenames to ensure compatibility across operating systems.
Use Case: Hindi Worksheets for a Teacher
A Hindi teacher prepares 30 grammar worksheets for students. She writes each worksheet in a separate TranslitHub document. At the end, she batch-exports all 30 as PDFs. The ZIP download contains 30 properly named, font-embedded PDFs ready to distribute or print.
Exporting for Specific Platforms
For WhatsApp / Telegram
Export as TXT, then open the file, select all, and paste. Unicode Indian language text pastes correctly into WhatsApp Web, Telegram Web, and most chat apps. Don't use DOCX for messaging apps — they'll attach the file rather than displaying the text.
For Email (Gmail, Outlook)
Copy from the TranslitHub editor and paste directly into Gmail or Outlook. The Unicode text pastes as-is. If you need a formatted document as an attachment, attach the PDF export.
For WordPress
Use the "Copy as HTML" button in the editor, then paste into WordPress's HTML editor (Text view, not Visual). This preserves headings, bold/italic, and list structure alongside the Unicode Indian language text. Alternatively, use the TranslitHub widget to type directly in WordPress without copy-pasting.
For Google Docs
Open Google Docs, paste from the editor. Formatting is usually preserved well. If the Indian language font looks different from what you intended, change the font in Google Docs to "Noto Sans [Language]" — Google Docs has excellent Noto font support.
Font Rendering Quality Check
Before finalizing a document for distribution, it's worth doing a quick rendering check:
- Export as PDF
- Open the PDF on a different device (phone, tablet, or ask a colleague)
- Check that complex characters like Hindi conjuncts (क्ष, त्र, ज्ञ), Tamil compound letters, or Malayalam conjuncts display correctly
- Check that matras are positioned correctly — they should attach to the right consonant, not float independently
Common Export Problems and Fixes
| Problem | Cause | Fix |
|---|---|---|
| PDF text appears as boxes | PDF viewer doesn't support embedded fonts | Use Adobe Acrobat or a modern browser's built-in PDF viewer |
| DOCX text garbled in old Word | Word version predates Unicode shaping | Update Word, or use PDF instead |
| TXT file shows ? characters | File opened with wrong encoding | Open with UTF-8 encoding in Notepad++ or VS Code |
| Hindi numerals instead of Arabic | preserveNumbers was false | Re-export with numbers toggle on |
| Extra line breaks in TXT | Line ending mismatch | Re-export with Windows line endings option |
Related Tools
- Transliteration Editor — the editor where you write and format before exporting
- Bulk Transliteration Tool — batch-convert large documents
- OCR for Indian Scripts — extract text from scanned documents before editing