Integration Pattern
Set up your RAG retriever
Load your PDF, split it into chunks, and index those chunks in a vector store using a standard LangChain setup. This step happens once at startup or index-build time.
Run retrieval to get relevant Document chunks
For each user query, call your retriever to get the most relevant
Document objects. Each Document carries the chunk text in page_content and a metadata dict that includes the page number (0-indexed).Call POST /highlight with the PDF URL and retrieved chunks
Pass the original PDF URL and the retrieved
Document objects to the /highlight endpoint. The service downloads the PDF, locates each chunk on its page, and draws yellow highlights over the matching text.End-to-End Example
The example below shows a complete RAG pipeline — from loading the PDF to delivering a highlighted result — using LangChain and the RAG PDF Highlighter API.The chunks you send in the
documents array must originate from the same PDF you specify in pdf_url. The service locates passages by searching for the exact chunk text on the given page of the downloaded PDF. If the chunks come from a different document, they will not match and no highlights will be applied.