pdf-extractor

featured fresh

Extract text, tables, and images from PDFs with layout preservation. Handles scanned documents via OCR fallback.

by closedlab · updated · license MIT · ★ 1.2k · ↓ 14.2k

install

npx stax add pdf-extractor

compatibility

claude-code full
cursor full
cline partial
aider untested

What it does

The skill triggers automatically when the agent detects a PDF path in the conversation or a request matching one of the trigger phrases. It returns structured markdown optimized for LLM consumption — tables stay tables, headings stay headings, footnotes get linked.

How to use

Drop a PDF into the conversation and ask for what you need. The skill handles extraction transparently:

Extract the Q4 revenue table from ./reports/q4-2025.pdf and summarize the year-over-year trend.

Under the hood

Uses poppler for text-layer extraction, falls back to tesseract when the PDF has no text layer (scanned docs). Tables are detected with a layout heuristic and normalized before being handed back to the model.