Use RAG PDF Highlighter as a Standalone Python Library

You do not need to run the FastAPI server to use RAG PDF Highlighter. The package exposes its core utilities as importable functions, so you can embed PDF highlighting directly inside any Python application — a script, a Jupyter notebook, or a larger service — without spinning up a separate process.

Install the Package

pip install rag-pdf-highlighter

Python 3.10 or later is required.

Complete Workflow Example

The snippet below shows the full end-to-end flow: download a PDF, apply highlights, read the result into memory, and clean up the temporary files.

import asyncio
from langchain_core.documents import Document
from rag_pdf_highlighter.utils.pdf_helpers import (
    download_pdf,
    highlight_chunks_in_pdf,
    cleanup_file
)
from rag_pdf_highlighter.exceptions import PDFDownloadError, HighlightError

async def highlight_pdf(pdf_url: str, documents: list[Document]) -> bytes:
    pdf_path = None
    output_path = None
    try:
        pdf_path = await download_pdf(pdf_url)
        output_path = highlight_chunks_in_pdf(pdf_path, documents)
        with open(output_path, "rb") as f:
            return f.read()
    finally:
        if pdf_path:
            cleanup_file(pdf_path)
        if output_path:
            cleanup_file(output_path)

# Usage
documents = [
    Document(
        page_content="The quick brown fox jumps over the lazy dog",
        metadata={"page": 0}
    )
]

pdf_bytes = asyncio.run(
    highlight_pdf("https://example.com/document.pdf", documents)
)

with open("highlighted.pdf", "wb") as f:
    f.write(pdf_bytes)

Always call cleanup_file() in a finally block for both pdf_path and output_path. Both functions write to temporary files on disk. If your code raises an exception before cleanup runs, those files will accumulate and consume disk space. The finally pattern above guarantees cleanup regardless of whether an error occurs.

Error Handling

Import the exception classes to handle specific failure modes gracefully.

from rag_pdf_highlighter.exceptions import (
    HighlightError,
    PDFDownloadError,
    PDFNotFoundError,
    NoDocumentsError,
)

async def safe_highlight(pdf_url: str, documents: list[Document]) -> bytes | None:
    pdf_path = None
    output_path = None
    try:
        pdf_path = await download_pdf(pdf_url)
        output_path = highlight_chunks_in_pdf(pdf_path, documents)
        with open(output_path, "rb") as f:
            return f.read()
    except PDFDownloadError as e:
        print(f"Could not fetch the PDF: {e}")
    except NoDocumentsError as e:
        print(f"No documents provided: {e}")
    except HighlightError as e:
        print(f"Highlighting failed: {e}")
    finally:
        if pdf_path:
            cleanup_file(pdf_path)
        if output_path:
            cleanup_file(output_path)
    return None

The exception hierarchy is:

Exception	Cause
`PDFDownloadError`	The URL fetch failed (network error, non-200 response)
`PDFNotFoundError`	The local PDF file path does not exist
`NoDocumentsError`	An empty document list was passed
`HighlightError`	Base class — catch this to handle any highlighting error

download_pdf is an async function and must be called with await. highlight_chunks_in_pdf is a regular synchronous function — call it directly without await. If you are calling from synchronous code, wrap the async parts with asyncio.run() as shown in the example above. If you are already inside an async context (e.g., a FastAPI route or an async test), use await for download_pdf directly.

​Install the Package

​Complete Workflow Example

​Error Handling

Install the Package

Complete Workflow Example

Error Handling