Extract plain text from Word DOCX files using mammoth.js. Preserves paragraph structure and line breaks. Shows word count, character count, and any conversion messages for unsupported Word features. Copy or download as .txt.
Click or drag a .docx Word document. mammoth.js reads the file entirely in your browser — the document is never uploaded to any server. Both old-style DOC and modern DOCX are accepted, though DOCX gives best results.
The plain text output appears immediately. Word count, character count, and line count are shown. A messages panel lists any Word features that could not be converted (e.g. tracked changes, footnotes, text boxes).
Click Copy to clipboard to paste the text anywhere. Click Download TXT to save as a .txt file. The filename matches the original DOCX filename with .txt extension.
mammoth.js is an open-source JavaScript library (github.com/mwilliamson/mammoth.js) designed to convert DOCX files to HTML or plain text. It focuses on semantic content (headings, paragraphs, lists) rather than pixel-perfect layout reproduction. It runs entirely in the browser using JavaScript and does not require a server or any installed software.
Paragraph breaks and line breaks are preserved. Headings become text on their own lines. Lists retain their content but lose bullet/number formatting. Tables are converted row by row with cells separated. Bold, italic, underline, and other character formatting are stripped since plain text has no formatting. Images and drawings are removed.
Plain text (.txt) has no concept of fonts, colours, sizes, tables with visual borders, or multiple columns. All visual formatting is necessarily lost. The output contains only the textual content in reading order. If you need to preserve formatting, use Word to HTML converter instead, which outputs the content as structured HTML.
mammoth.js reports elements it cannot fully convert: tracked changes (showing accepted text only), footnotes and endnotes (content may be appended), comments (stripped), text boxes (content extracted), and SmartArt (converted to alt text if available). These messages help you know what manual cleanup may be needed.
No — mammoth.js cannot read encrypted or password-protected DOCX files. Remove the password protection in Microsoft Word (File > Info > Protect Document > Encrypt with Password > clear the password field) before uploading. The same applies to IRM-protected documents in enterprise environments.
DOC is the legacy Microsoft Word binary format (pre-2007). DOCX is the modern Open XML format introduced in Word 2007, stored as a ZIP archive containing XML files. mammoth.js is optimised for DOCX. DOC files may work but with reduced fidelity. For best results, open your DOC in Word and Save As DOCX before converting.