Scanned PDF OCR

Run OCR on scanned PDFs in the browser and keep the working copy private.

When a PDF is image-based, direct extraction breaks down. PDF2X can render the page, preprocess the image locally, run OCR in-browser, and export the result as usable Markdown or text.

Ideal for scan-heavy research packets, photographed pages, legacy reports, and OCR-first intake flows.

Recommended OCR flow

Set the extraction mode, then clean the text after recognition.

Start with Auto or OCR

Auto is safer for mixed PDFs. OCR mode is the better choice when every page is image-based.

Keep image preprocessing on

The local preprocessing pass can improve contrast and simplify the page before OCR runs.

Apply cleanup after OCR

Wrapped-line repair, header removal, and hyphenation fixes help OCR output read more like text and less like page fragments.

Output choices

Pick the export based on what happens next.

Use Markdown when

You want a more readable working copy with headings, lists, and light structure preserved for review or prompt editing.

Use plain text when

You are feeding the result into chunking, indexing, or analysis workflows that prefer the simplest possible text export.

After OCR

Always review the preview before exporting. OCR can still introduce recognition mistakes, especially on low-quality scans.