Request lifecycle
PDF Download
When a request arrives, the service asynchronously downloads the PDF from the URL you provide using
httpx. The file is written to a temporary location on disk so that PyMuPDF can open it page by page. The download is fully async and does not block other concurrent requests.Chunk Location
For each
Document object in your request, the service reads metadata.page to identify the target page, then attempts to locate page_content on that page using three successive matching strategies — exact match, sentence-level match, and collapsed-whitespace match. The service tries each strategy in order and stops as soon as a match is found, returning a set of bounding boxes that cover the matched text.Highlight Application
Once bounding boxes are found for a chunk, PyMuPDF draws a yellow highlight annotation over each rectangle on the correct page. Near-duplicate bounding boxes are removed automatically before annotations are written, preventing double-highlights on the same region.
Response
After all chunks have been processed, the fully annotated PDF is serialised to bytes and returned to the caller as an
application/pdf binary response. You can save this directly to disk or stream it to a browser.Stateless design
Every request to RAG PDF Highlighter is fully self-contained. The service holds no session state, no cached PDFs, and no stored annotations between calls. This means you can scale horizontally behind a load balancer without sticky sessions, and each request will produce a deterministic result given the same inputs. Because there is no shared state, retrying a failed request is always safe — you will never corrupt a partially-annotated document from a previous attempt.Chunks whose
metadata.page value does not match any page in the PDF, or whose text cannot be found by any of the three matching strategies, are silently skipped. The service still returns a valid annotated PDF — it simply contains no highlight for that chunk. No error or warning is raised for unmatched chunks.