Skip to main content
This page provides a complete schema reference for every request and response shape used by the RAG PDF Highlighter API. Use it as a quick lookup when integrating the service into your pipeline or writing client code.

HighlightRequest

The HighlightRequest schema defines the JSON body accepted by POST /highlight. Both fields are required.
{
  "pdf_url": "https://example.com/document.pdf",
  "documents": [
    {
      "page_content": "Text chunk to highlight",
      "metadata": {
        "page": 0,
        "source": "document.pdf",
        "chunk_id": "abc123"
      }
    }
  ]
}
pdf_url
string
required
The fully qualified URL of the PDF to download and annotate. The service performs an HTTP GET against this URL at request time. The URL must be reachable from the host running the service and must return a valid PDF file with a 200 status code.
documents
array
required
A non-empty list of DocumentPayload objects. Each object represents one text chunk that the service will locate and highlight in the PDF. Passing an empty array results in a 400 Bad Request response.
You can include any extra fields in metadata beyond page — for example source, chunk_id, score, or any other key your RAG pipeline produces. The highlighter preserves these fields in the DocumentPayload object and simply ignores them when processing highlights. This means you can pass your retriever’s raw output directly without stripping metadata.

Success Response

When highlighting succeeds, the API returns the annotated PDF as a binary stream.
PropertyValue
HTTP status200 OK
Content-Typeapplication/pdf
Content-Dispositionattachment; filename="highlighted.pdf"
BodyRaw binary PDF data
Write the response body directly to a .pdf file. Do not attempt to parse it as JSON.

Error Response

All error responses — 400, 422, and 500 — share the same outer shape: a JSON object with a single detail key.
{"detail": "Failed to download PDF: Connection timeout"}
For 422 Unprocessable Entity errors, detail is an array of Pydantic validation error objects rather than a plain string:
{
  "detail": [
    {
      "loc": ["body", "pdf_url"],
      "msg": "field required",
      "type": "value_error.missing"
    }
  ]
}
detail
string | array
A human-readable description of the error. For 400 and 500 errors this is a plain string. For 422 validation errors this is an array of objects, each containing:
  • loc (array): the location of the invalid field, e.g. ["body", "pdf_url"]
  • msg (string): a description of the validation failure
  • type (string): a Pydantic error type identifier, e.g. "value_error.missing"
For a complete breakdown of each error code, its causes, and recommended fixes, see the Errors reference page.