Getting Started
Installation
Section titled “Installation”Install from PyPI:
pip install pdf-modifier-mcpOr run directly without installing (requires uv):
uvx pdf-modifier-mcpRequires Python 3.12 or later.
To install from source:
git clone https://github.com/mlorentedev/pdf-modifier-mcp.gitcd pdf-modifier-mcpmake setupCLI Usage
Section titled “CLI Usage”The pdf-mod command exposes six operations.
Replace text
Section titled “Replace text”pdf-mod modify input.pdf output.pdf -r "old text=new text"Stack multiple replacements in a single pass:
pdf-mod modify input.pdf output.pdf -r "$99.99=$149.99" -r "Draft=Final"Font style (family, weight, size, color) is preserved automatically. The tool maps embedded fonts to the nearest Base 14 family — Helvetica, Times-Roman, or Courier — with bold variant detection.
Regex replacement
Section titled “Regex replacement”Enable pattern matching with --regex:
pdf-mod modify input.pdf output.pdf -r "Order #\d+=Order #REDACTED" --regexpdf-mod modify input.pdf output.pdf -r "January \d{2}, \d{4}=DATE REDACTED" --regexPatterns are validated at input. Invalid regex returns an error before touching the PDF.
Create and neutralize hyperlinks
Section titled “Create and neutralize hyperlinks”Append a URL after | to make the replacement text clickable:
pdf-mod modify input.pdf output.pdf -r "Click Here=Visit Site|https://example.com"Neutralize an existing link with void(0):
pdf-mod modify input.pdf output.pdf -r "Click Here=Click Here|void(0)"Supported schemes: http://, https://, mailto:, javascript:.
Analyze PDF structure
Section titled “Analyze PDF structure”Plain text extraction with page separators:
pdf-mod analyze input.pdfFull structure as JSON — every text span with position, font, size, and color:
pdf-mod analyze input.pdf --jsonInspect fonts
Section titled “Inspect fonts”Search for terms and display their font properties in a Rich table:
pdf-mod inspect input.pdf "Invoice" "Total" "$"Output columns: Page, Term, Font, Size, Context (first 100 characters of the containing span).
List hyperlinks
Section titled “List hyperlinks”Inventory all existing links in the document:
pdf-mod links input.pdfOutput includes page number, target URI, and the text area covered by the link.
Batch processing
Section titled “Batch processing”Apply the same replacements to multiple files at once:
pdf-mod batch file1.pdf file2.pdf -o output/ -r "Draft=Final"Each file is processed independently — failures in one file don’t stop the batch. Output files are saved to --output-dir with the same filename.
pdf-mod batch *.pdf -o redacted/ -r "\d{4}-\d{4}-\d{4}-\d{4}=XXXX-XXXX-XXXX-XXXX" --regexMCP Server
Section titled “MCP Server”The MCP server exposes the same functionality over stdio for AI agent integration. Use user scope (-s user) so the tools are available across all your projects.
Claude Code
Section titled “Claude Code”claude mcp add -s user pdf-modifier -- uvx --upgrade pdf-modifier-mcpGemini CLI
Section titled “Gemini CLI”gemini mcp add -s user pdf-modifier uvx -- --upgrade pdf-modifier-mcpOpenAI Codex CLI
Section titled “OpenAI Codex CLI”Add to ~/.codex/config.toml:
[mcp_servers.pdf-modifier]command = "uvx"args = ["--upgrade", "pdf-modifier-mcp"]GitHub Copilot (VS Code)
Section titled “GitHub Copilot (VS Code)”Add to .vscode/mcp.json or your User Settings:
{ "mcp": { "servers": { "pdf-modifier": { "command": "uvx", "args": ["--upgrade", "pdf-modifier-mcp"] } } }}Other Clients
Section titled “Other Clients”Any MCP-compatible client (like Cursor or Windsurf) that supports stdio transport will work. Point the client to uvx --upgrade pdf-modifier-mcp.
Available tools
Section titled “Available tools”| Tool | Parameters | Description |
|---|---|---|
read_pdf_structure | input_path, password? | Returns complete PDF structure — text, bounding boxes, font names, sizes, colors — as JSON. Use this first to understand the document layout before making changes. |
inspect_pdf_fonts | input_path, terms[], password? | Searches for text terms (substring match) and returns font name, size, and position for each match. Run this before replacements to verify font handling. |
list_pdf_hyperlinks | input_path, password? | Extracts all existing hyperlinks and URIs from the document, including their location and covered text. |
modify_pdf_content | input_path, output_path, replacements{}, use_regex?, password? | Find and replace text with style preservation. Supports regex patterns and hyperlink syntax (text|URL). Returns replacements made, pages modified, and any warnings. |
batch_modify_pdf_content | input_paths[], output_dir, replacements{}, use_regex?, password? | Apply the same replacements to multiple PDFs at once. Per-file error isolation. |
All tools return structured JSON. Errors include a typed error code (FILE_NOT_FOUND, READ_ERROR, WRITE_ERROR), a human-readable message, and a details object.
Typical agent workflow
Section titled “Typical agent workflow”- Call
read_pdf_structureto get the full document layout. - Call
inspect_pdf_fontswith the target terms to confirm font properties. - Call
modify_pdf_contentwith the replacement map.
How it works
Section titled “How it works”- The PDF is opened with PyMuPDF and each page is scanned for text spans matching the target.
- All matches on a page are collected, then redacted in batch (
apply_redactions()called once per page). - Replacement text is inserted at the original coordinates with matched font properties.
- Embedded font names are mapped to Base 14 equivalents:
"Arial-BoldMT"becomes Helvetica-Bold (HeBo),"TimesNewRomanPSMT"becomes Times-Roman (TiRo), etc.
Text matching uses a two-pass strategy: first within individual spans, then across span boundaries within the same line. This handles most cases where PDF producers split text across multiple spans.
Architecture
Section titled “Architecture”Entry Points Core Layer Engine+-----------------------+ +-----------------------+ +----------------+| CLI (Typer + Rich) |-->| PDFModifier |-->| PyMuPDF (fitz) || MCP Server (FastMCP) | | PDFAnalyzer | +----------------++-----------------------+ | Pydantic v2 models | +-----------------------+- PDFModifier — context manager that opens, modifies, and saves the PDF. Two-pass matching (single-span + cross-span) with batch-redact strategy.
- PDFAnalyzer — reads structure via
page.get_text("dict"), traverses the block/line/span hierarchy. - batch_process() — processes multiple PDFs independently with per-file error isolation.
- Pydantic models —
ReplacementSpecvalidates input (regex compilation, max 100 replacements).ModificationResult,BatchResult,PDFStructure, andFontInspectionResultare the typed outputs.