Skip to main content
The /highlight endpoint is the core of the RAG PDF Highlighter API. You give it a publicly accessible URL to a PDF and a list of text chunks retrieved from your RAG pipeline, and it downloads the PDF, locates each chunk on its target page, applies yellow highlights, and streams the annotated file back to you as a binary PDF download.

Request

Method: POST
Path: /highlight
Content-Type: application/json

Body Parameters

pdf_url
string
required
A publicly accessible URL pointing to the PDF you want to annotate. The service downloads this file at request time, so the URL must be reachable from the host running the service.
documents
array
required
A list of document chunk objects to locate and highlight in the PDF. Each element must conform to the DocumentPayload shape.
Page numbers in metadata.page are 0-indexed. The first page of your PDF is page 0, the second is page 1, and so on. If your RAG pipeline stores 1-indexed page numbers, subtract 1 before sending them to this API.

Examples

curl -X POST http://localhost:8000/highlight \
  -H "Content-Type: application/json" \
  -d '{
    "pdf_url": "https://example.com/research-paper.pdf",
    "documents": [
      {
        "page_content": "The results demonstrate a 23% improvement",
        "metadata": {"page": 4}
      },
      {
        "page_content": "We conclude that the proposed method is effective",
        "metadata": {"page": 12}
      }
    ]
  }' \
  --output highlighted.pdf

Responses

200 OK

The highlights were applied successfully. The response body is a binary PDF file.
HeaderValue
Content-Typeapplication/pdf
Content-Dispositionattachment; filename="highlighted.pdf"
Write the raw response bytes to a .pdf file to view the result.

400 Bad Request

Returned when the request is structurally valid but the service cannot complete it. See the Errors page for specific causes including PDFDownloadError and NoDocumentsError.
{"detail": "Failed to download PDF: HTTP 404"}

422 Unprocessable Entity

Returned by Pydantic when required fields (pdf_url or documents) are missing or have the wrong type.
{
  "detail": [
    {
      "loc": ["body", "pdf_url"],
      "msg": "field required",
      "type": "value_error.missing"
    }
  ]
}

500 Internal Server Error

An unexpected error occurred during processing. Check the service logs for details.
{"detail": "Highlighting failed: <error message>"}
If a chunk’s text cannot be found on its target page, the service silently skips that chunk and continues processing the remaining documents. No error is raised and the returned PDF will simply not contain a highlight for that chunk. See Chunks not highlighted for common causes.