pdf-extractor
Extract text, tables, and images from PDFs with layout preservation. Handles scanned documents via OCR fallback.
install
npx stax add pdf-extractor compatibility
What it does
The skill triggers automatically when the agent detects a PDF path in the conversation or a request matching one of the trigger phrases. It returns structured markdown optimized for LLM consumption — tables stay tables, headings stay headings, footnotes get linked.
How to use
Drop a PDF into the conversation and ask for what you need. The skill handles extraction transparently:
Extract the Q4 revenue table from ./reports/q4-2025.pdf and summarize the year-over-year trend.
Under the hood
Uses poppler for text-layer extraction, falls back to tesseract when the PDF has no text layer (scanned docs). Tables are detected with a layout heuristic and normalized before being handed back to the model.