Skip to main content
RAG PDF Highlighter returns structured error responses for every failure scenario so your application can react appropriately. Understanding which HTTP status code maps to which problem — and what to do about it — will save you significant debugging time.

Error Reference

HTTP StatusCauseHow to Fix
400PDF URL is unreachable or returns a non-200 responseVerify the URL is publicly accessible without authentication
400The documents list is emptyPass at least one document object in the request
400Highlight processing failed internallyCheck that page numbers are valid integers and 0-indexed
422Missing required fields (pdf_url or documents)Ensure both fields are present in the request body
500Unexpected internal errorCheck the service logs or contact support

Error Response Format

All 4xx and 5xx responses return a JSON body with a single detail field describing what went wrong.
{
  "detail": "Failed to download PDF: HTTP 404"
}
Read detail to get a human-readable explanation you can log or surface to your users.

Python Error Handling Example

Check the response status code before attempting to read the PDF content, and branch on the specific code to apply the right recovery logic.
import requests

response = requests.post(
    "http://localhost:8000/highlight",
    json={
        "pdf_url": "https://example.com/document.pdf",
        "documents": [
            {"page_content": "some text", "metadata": {"page": 0}}
        ]
    }
)

if response.status_code == 200:
    with open("highlighted.pdf", "wb") as f:
        f.write(response.content)
elif response.status_code == 400:
    error = response.json()
    print(f"Request error: {error['detail']}")
elif response.status_code == 422:
    print("Validation error: check your request payload")
else:
    print(f"Unexpected error: {response.status_code}")

Debugging Common Problems

If the service returns a 200 response but the output PDF has no visible highlights, the most likely cause is that none of the chunk texts could be located on their specified pages. The service skips unmatched chunks silently rather than returning an error.Steps to diagnose:
  1. Confirm page numbers are 0-indexed. Page 1 of the PDF is "page": 0. A chunk with "page": 1 will be searched on the second page of the document.
  2. Check that chunks originate from the same PDF. Chunks retrieved from a different document will not match any text in the target PDF.
  3. Try shorter, more precise text. Very long chunks or chunks with OCR artifacts may fail fuzzy matching. Reduce chunk_size when splitting or use a passage that appears verbatim in the PDF.
A 400 response with a detail message containing “download” means the service could not fetch the PDF from the URL you provided.Common causes:
  • The URL requires authentication (cookies, API keys, OAuth). The service makes an unauthenticated GET request — the PDF must be publicly accessible.
  • The server is behind a firewall or VPN that blocks requests from the service’s IP.
  • The URL returns a redirect to a login page rather than a 4xx status, so the service downloads HTML instead of a PDF.
To verify, open the URL in a browser where you are not signed in, or use curl without any auth headers: curl -I https://example.com/document.pdf.
A 422 response means Pydantic rejected your request body before any processing started. The response body includes a detail array that pinpoints every field that failed validation.
{
  "detail": [
    {
      "type": "missing",
      "loc": ["body", "pdf_url"],
      "msg": "Field required",
      "input": {}
    }
  ]
}
Read the loc array to identify which field is missing or malformed. The most common causes are:
  • Omitting pdf_url or documents entirely from the request body.
  • Sending documents as a list of plain strings instead of objects with page_content and metadata keys.
  • Sending the request with Content-Type: application/x-www-form-urlencoded instead of Content-Type: application/json.