Skip to content

Getting Started

Install from PyPI:

Terminal window
pip install pdf-modifier-mcp

Or run directly without installing (requires uv):

Terminal window
uvx pdf-modifier-mcp

Requires Python 3.12 or later.

To install from source:

Terminal window
git clone https://github.com/mlorentedev/pdf-modifier-mcp.git
cd pdf-modifier-mcp
make setup

The pdf-mod command exposes six operations.

Terminal window
pdf-mod modify input.pdf output.pdf -r "old text=new text"

Stack multiple replacements in a single pass:

Terminal window
pdf-mod modify input.pdf output.pdf -r "$99.99=$149.99" -r "Draft=Final"

Font style (family, weight, size, color) is preserved automatically. The tool maps embedded fonts to the nearest Base 14 family — Helvetica, Times-Roman, or Courier — with bold variant detection.

Enable pattern matching with --regex:

Terminal window
pdf-mod modify input.pdf output.pdf -r "Order #\d+=Order #REDACTED" --regex
pdf-mod modify input.pdf output.pdf -r "January \d{2}, \d{4}=DATE REDACTED" --regex

Patterns are validated at input. Invalid regex returns an error before touching the PDF.

Append a URL after | to make the replacement text clickable:

Terminal window
pdf-mod modify input.pdf output.pdf -r "Click Here=Visit Site|https://example.com"

Neutralize an existing link with void(0):

Terminal window
pdf-mod modify input.pdf output.pdf -r "Click Here=Click Here|void(0)"

Supported schemes: http://, https://, mailto:, javascript:.

Plain text extraction with page separators:

Terminal window
pdf-mod analyze input.pdf

Full structure as JSON — every text span with position, font, size, and color:

Terminal window
pdf-mod analyze input.pdf --json

Search for terms and display their font properties in a Rich table:

Terminal window
pdf-mod inspect input.pdf "Invoice" "Total" "$"

Output columns: Page, Term, Font, Size, Context (first 100 characters of the containing span).

Inventory all existing links in the document:

Terminal window
pdf-mod links input.pdf

Output includes page number, target URI, and the text area covered by the link.

Apply the same replacements to multiple files at once:

Terminal window
pdf-mod batch file1.pdf file2.pdf -o output/ -r "Draft=Final"

Each file is processed independently — failures in one file don’t stop the batch. Output files are saved to --output-dir with the same filename.

Terminal window
pdf-mod batch *.pdf -o redacted/ -r "\d{4}-\d{4}-\d{4}-\d{4}=XXXX-XXXX-XXXX-XXXX" --regex

The MCP server exposes the same functionality over stdio for AI agent integration. Use user scope (-s user) so the tools are available across all your projects.

Terminal window
claude mcp add -s user pdf-modifier -- uvx --upgrade pdf-modifier-mcp
Terminal window
gemini mcp add -s user pdf-modifier uvx -- --upgrade pdf-modifier-mcp

Add to ~/.codex/config.toml:

[mcp_servers.pdf-modifier]
command = "uvx"
args = ["--upgrade", "pdf-modifier-mcp"]

Add to .vscode/mcp.json or your User Settings:

{
"mcp": {
"servers": {
"pdf-modifier": {
"command": "uvx",
"args": ["--upgrade", "pdf-modifier-mcp"]
}
}
}
}

Any MCP-compatible client (like Cursor or Windsurf) that supports stdio transport will work. Point the client to uvx --upgrade pdf-modifier-mcp.

ToolParametersDescription
read_pdf_structureinput_path, password?Returns complete PDF structure — text, bounding boxes, font names, sizes, colors — as JSON. Use this first to understand the document layout before making changes.
inspect_pdf_fontsinput_path, terms[], password?Searches for text terms (substring match) and returns font name, size, and position for each match. Run this before replacements to verify font handling.
list_pdf_hyperlinksinput_path, password?Extracts all existing hyperlinks and URIs from the document, including their location and covered text.
modify_pdf_contentinput_path, output_path, replacements{}, use_regex?, password?Find and replace text with style preservation. Supports regex patterns and hyperlink syntax (text|URL). Returns replacements made, pages modified, and any warnings.
batch_modify_pdf_contentinput_paths[], output_dir, replacements{}, use_regex?, password?Apply the same replacements to multiple PDFs at once. Per-file error isolation.

All tools return structured JSON. Errors include a typed error code (FILE_NOT_FOUND, READ_ERROR, WRITE_ERROR), a human-readable message, and a details object.

  1. Call read_pdf_structure to get the full document layout.
  2. Call inspect_pdf_fonts with the target terms to confirm font properties.
  3. Call modify_pdf_content with the replacement map.
  1. The PDF is opened with PyMuPDF and each page is scanned for text spans matching the target.
  2. All matches on a page are collected, then redacted in batch (apply_redactions() called once per page).
  3. Replacement text is inserted at the original coordinates with matched font properties.
  4. Embedded font names are mapped to Base 14 equivalents: "Arial-BoldMT" becomes Helvetica-Bold (HeBo), "TimesNewRomanPSMT" becomes Times-Roman (TiRo), etc.

Text matching uses a two-pass strategy: first within individual spans, then across span boundaries within the same line. This handles most cases where PDF producers split text across multiple spans.

Entry Points Core Layer Engine
+-----------------------+ +-----------------------+ +----------------+
| CLI (Typer + Rich) |-->| PDFModifier |-->| PyMuPDF (fitz) |
| MCP Server (FastMCP) | | PDFAnalyzer | +----------------+
+-----------------------+ | Pydantic v2 models |
+-----------------------+
  • PDFModifier — context manager that opens, modifies, and saves the PDF. Two-pass matching (single-span + cross-span) with batch-redact strategy.
  • PDFAnalyzer — reads structure via page.get_text("dict"), traverses the block/line/span hierarchy.
  • batch_process() — processes multiple PDFs independently with per-file error isolation.
  • Pydantic modelsReplacementSpec validates input (regex compilation, max 100 replacements). ModificationResult, BatchResult, PDFStructure, and FontInspectionResult are the typed outputs.