Skip to main content
When the RAG PDF Highlighter API cannot fulfill a request it returns a standard HTTP error status code and a JSON body containing a detail field that describes what went wrong. This page lists every error code the service emits, explains its cause, and tells you how to fix it.

Error Codes

A 400 response means your request was well-formed JSON but the service could not process it due to a recoverable, caller-side problem. There are three distinct causes:PDFDownloadError — The service attempted to fetch the PDF from pdf_url but the request failed. This happens when the URL is not reachable from the service host, returns a non-200 HTTP status, or times out.NoDocumentsError — The documents array you supplied was empty. The service requires at least one chunk to highlight.HighlightError — An error occurred during the highlighting process itself after the PDF was successfully downloaded. This is the base class for PDF-related failures.Example response:
{"detail": "Failed to download PDF: HTTP 404"}
How to fix:
  • Confirm that pdf_url resolves to a real, publicly accessible PDF. Test it by fetching the URL from the same network as the service host.
  • Ensure the server hosting the PDF returns Content-Type: application/pdf and HTTP 200.
  • Make sure documents contains at least one DocumentPayload object — an empty array is not allowed.
A 422 response is generated automatically by Pydantic when your request body is missing a required field or contains a field with the wrong type. The two required fields are pdf_url (string) and documents (array).Example response:
{
  "detail": [
    {
      "loc": ["body", "pdf_url"],
      "msg": "field required",
      "type": "value_error.missing"
    }
  ]
}
Each object in the detail array describes one validation failure:
KeyDescription
locPath to the invalid field, e.g. ["body", "pdf_url"]
msgHuman-readable explanation of the failure
typePydantic error type identifier
How to fix:
  • Ensure your request body includes both pdf_url as a non-empty string and documents as an array.
  • Verify that Content-Type: application/json is set on the request and that the body is valid JSON.
  • Check that page_content is present and is a string on every element in documents.
A 500 response indicates that an unexpected error occurred inside the service that was not anticipated by the application’s error handling. This is not caused by your request data.Example response:
{"detail": "Highlighting failed: <error message>"}
How to fix:
  • Check the service logs for a full stack trace — the log entry will contain more detail than the response body.
  • Retry the request. Some 500 errors are transient (for example, a momentary file-system issue).
  • If the error persists with the same input, open an issue on the project repository and include the log output and the request payload.

Chunks Not Highlighted

If a chunk’s text cannot be located on its target page, the service silently skips that chunk and continues processing the rest of the document list. This is intentional behavior — no error is raised and the response is still 200 OK with a valid PDF. The returned PDF simply will not contain a highlight annotation for the skipped chunk. Common causes of a chunk being silently skipped:
  • Wrong page numbermetadata.page is 0-indexed. If your RAG pipeline stores 1-indexed page numbers, every chunk will be searched on the wrong page. Subtract 1 from all page values before sending them to the API.
  • Text encoding mismatch — The PDF may store text in a different encoding or with ligatures and special characters that differ from what your chunker extracted. Try copying text directly from the PDF viewer and comparing it to your page_content value.
  • Chunk boundaries — If the chunk starts or ends mid-word at a page break, the full string may not appear on a single page. Try a shorter substring of the chunk that is entirely contained within one page.
  • Scanned or image-based PDFs — If the PDF contains scanned pages without an embedded text layer, there is no selectable text to match against and all chunks on those pages will be skipped.
To verify your page numbers are correct, open the PDF in a viewer, navigate to page metadata.page + 1 (because viewers display 1-indexed page numbers), and confirm the chunk text is visible there.