PDF to CSV Converter

Frequently Asked Questions

What types of PDF tables can be extracted?

The extractor can identify and extract tables that are visually structured with lines, borders or consistent column alignment in the PDF. It works best with programmatically generated PDFs (from software such as Excel, Word, accounting systems or reporting tools) where table structure is defined in the PDF data. Scanned PDFs (images of tables) require OCR processing before table extraction — the tool will flag these and apply OCR automatically where possible.

What does the output CSV contain?

The output CSV contains the extracted table data as comma-separated values, with each table row on a new line and each cell separated by a comma. Column headers from the first row of each table are preserved. Multiple tables from the same page are extracted in order of their position on the page (top to bottom). If the PDF contains multiple pages with tables, all tables are extracted and appended in page order.

Can I open the CSV directly in Excel?

Yes — the output CSV file opens natively in Microsoft Excel, Google Sheets, LibreOffice Calc and Numbers on macOS. In Excel, go to File → Open or double-click the CSV file. If columns are not automatically separated, use the Data → Text to Columns wizard and choose comma as the delimiter. Google Sheets handles CSV files automatically when imported via File → Import.

What happens if the PDF has merged cells?

Merged cells (cells that span multiple columns or rows) are expanded in the CSV output — each cell's value is repeated across the number of columns or rows it spanned in the original. This is necessary because CSV format does not support cell merging. In complex tables with many merged cells, you may need to clean up the output in a spreadsheet application after extraction.

Can I extract tables from scanned PDF documents?

Yes — when a scanned PDF is uploaded, the tool automatically applies OCR (Optical Character Recognition) to detect text before attempting table extraction. OCR accuracy depends on the scan quality and resolution. High-quality scans (300 DPI or higher, good contrast, no skew) produce excellent results. Low-quality or heavily compressed scans may result in some text recognition errors that need manual correction in the CSV output.

What if the PDF has multiple tables per page?

All tables on each page are detected and extracted. Each table is labelled in the CSV with a comment row (or a separate sheet if you choose multi-sheet output) indicating the source page and table number. This allows you to easily identify which data came from which location in the original document, especially useful for PDFs like annual reports that contain many separate data tables.

Does the conversion preserve number formatting?

Number values are extracted as plain text in CSV format — currency symbols (£, $, €), percentage signs and comma thousands separators are preserved as they appear in the PDF. You can then apply number formatting within your spreadsheet application. Dates are extracted in whatever format they appear in the PDF; you may need to reformat them using your spreadsheet's date functions if you need a specific date format.

What is the difference between PDF to CSV and PDF to Excel?

CSV (Comma-Separated Values) is a plain text format readable by any spreadsheet software, with no formatting — just rows and columns of data. Excel (XLSX) supports multiple sheets, cell formatting, formulas and charts. For raw data extraction and maximum compatibility, CSV is the better choice. For a formatted output with column widths, bold headers and number formatting preserved, use our PDF to Excel converter instead.

What if my PDF contains financial data with negative numbers?

Negative numbers are extracted as they appear in the PDF. Accounting-format negatives (enclosed in brackets, e.g. (1,500.00)) are preserved as text in that format. Standard minus sign negatives (-1500.00) are extracted with the minus sign. To convert bracket-format negatives to numerical negatives in Excel, use a custom formula or Find & Replace to reformat them after extraction.

Is my PDF sent to any server for processing?

No — PDF table extraction runs entirely within your browser using JavaScript-based PDF parsing. For scanned PDFs requiring OCR, the processing also happens locally using a browser-based OCR engine (Tesseract.js). No file is transmitted to any external server at any stage. This ensures complete privacy for financial records, invoices, payroll data and other sensitive business documents.

📖How to Use the PDF to CSV Converter

💡Common Use Cases

Frequently Asked Questions

PDF to CSV Converter

📖How to Use the PDF to CSV Converter

💡Common Use Cases

Frequently Asked Questions

Related Tools