AI is changing how we interact with information. One of the most practical shifts is how we can now ask questions directly to our documents: PDFs, books, manuals, and other unstructured sources. Without manually searching page by page.
In this webinar, Matias Bustamante walks through a hands-on tour of three different approaches to querying PDFs with AI:
- LangChain + OpenAI (Python) using a Retrieval-Augmented Generation (RAG) workflow
- ChatD + local LLMs (via Ollama) to query PDFs offline
- NotebookLM (Google) powered by Gemini, with a no-code interface and strong citations
🎥 Watch the Full Webinar
Prefer to watch? The complete session is above.
Prefer to read and replicate the implementations? Keep scrolling.
What You’ll Learn
By the end of this guide, you will:
- Understand what an LLM is (in simple terms) and why it works
- Learn what RAG is and when it improves accuracy
- See how indexing → retrieval → generation works in document Q&A
- Build a PDF Q&A flow using LangChain + OpenAI
- Compare results against local models (Mistral / DeepSeek) using ChatD
- Use NotebookLM to query multiple documents with citations, no code required
- Understand real-world tradeoffs: accuracy, cost, privacy, and usability
Quick Tour of Recent AI Updates
Before going hands-on, Matias briefly highlights a few AI stories and what they signal:
- DeepSeek: positioned as a major launch with claims about efficiency and training improvements
- Baidu (Ernie): new versions announced, with a focus on cost and capability improvements
- Arizona Supreme Court: experimented with AI avatars to communicate announcements publicly
- Italian newspaper: published an edition created with AI support, with journalists curating the output
These examples reinforce the theme of the webinar: AI is moving fast and practical workflows matter.
Two Key Concepts: LLMs and RAG
What Is an LLM?
An LLM (Large Language Model) can be thought of as a system that predicts the next word in a response based on patterns learned from large-scale training data.
The breakthrough that made modern systems fast and effective is transformer-based attention, enabling stronger context handling and faster generation.
Common commercial uses include:
- Customer support
- Content drafting
- Software development assistance (e.g., copilots)
What Is RAG?
RAG (Retrieval-Augmented Generation) is a technique to improve accuracy by giving the model relevant external information at answer time.
This matters when:
- Your model’s training data may be outdated
- Your knowledge lives in private documents (manuals, internal PDFs, books, policies)
The 3 Steps of RAG
- Indexing: split documents into chunks and store them for retrieval
- Retrieval: search for the most relevant chunks (often via vector similarity)
- Generation: pass the retrieved chunks to the LLM so it answers grounded in the source
Result: more accurate, document-grounded answers.
Approach 1: Query PDFs with LangChain + OpenAI (Python)
This first approach is a classic RAG implementation using Python.
Requirements
- Python 3.9
- pip
- OpenAI API access (paid usage via credits)
What LangChain Does
LangChain is an open-source framework that simplifies building LLM applications, especially for workflows like retrieval, chaining, prompt templates, and evaluation.
The Workflow Used in the Demo
- Load the PDF (via a PDF loader)
- Split text into chunks (with overlap to avoid losing context)
- Create embeddings for chunks
- Store vectors in a vector database (Chroma in the demo)
- Retrieve relevant chunks for a question
- Generate an answer using an LLM (GPT-4 used in the session)
Demo Questions
Matias tests the pipeline with two documents and questions:
- Clean Code (Robert C. Martin)
Question: Are comments good or bad in code?
The response reflects the book’s stance (e.g., “comments as a necessary evil” and emphasis on expressive code/refactoring). - A car manual (Chery Tiggo 7 Pro)
Question: What’s the purpose of the auto-hold system?
The answer includes both explanation and operational steps.
Key Note About Privacy / Data Sent
Only relevant chunks are sent to the LLM for answering (not the entire book/PDF).
Pros / Cons (as presented)
Pros
- Flexible: you can swap models (OpenAI, Anthropic, Gemini, etc.)
- Highly customizable: prompts, multi-question scripts, evaluation
- Strong baseline for building real products
Cons
- Requires setup + code
- Paid API usage
- Output formatting and multi-doc UX require extra work
Approach 2: Query PDFs Locally with ChatD + Ollama
This approach uses ChatD (desktop app) plus Ollama to run LLMs locally (e.g., Mistral, DeepSeek).
Requirements
- Ollama running locally
- ChatD installed
- A local model downloaded
Why This Approach Matters
Local setups can be valuable when:
- You want to avoid sending data to external APIs
- You need offline access
- You’re testing multiple models quickly
Demo Comparison
Matias repeats the same questions against different local models.
- With Mistral, the “Clean Code” answer is described as more generic/misleading compared to the RAG-grounded response.
- With DeepSeek, the answer is more detailed, but still may not quote/ground as well as the first approach.
Limitations Noted
- You’re restricted to local models only
- Typically works with one document at a time
- If you need to query a “library” of documents together, it can feel limiting
Approach 3: Query PDFs with NotebookLM (Gemini, No-Code)
This is the simplest workflow: upload documents into NotebookLM and ask questions through a web UI.
What Makes NotebookLM Stand Out
- No local install
- No coding
- Works well with multi-document “sources”
- Shows which chunks were used to answer (strong for trust and verification)
- Can generate an audio/podcast-like overview (English-only noted)
Demo Outcome
For both test questions (“Clean Code” and the car manual), NotebookLM produced answers that Matias describes as more in-depth and better grounded than the other options, especially because it surfaces citations/chunk references.
Tradeoffs Mentioned
- NotebookLM is a Google experiment and could change or be discontinued
- If usage spikes, response time or availability could vary
When Will RAG Still Matter With Huge Context Windows?
A question from the audience: if context windows grow massively, will RAG become obsolete?
Matias’ take: RAG will still matter in cases like:
- Sensitive company documents
- Internal knowledge that shouldn’t be uploaded
- Local/private setups where retrieval pipelines remain useful
Key Takeaways
This webinar shows three practical ways to “chat with your PDFs,” each suited to different needs:
- LangChain + OpenAI (Python): best for building robust, productizable pipelines
- ChatD + local models: best for privacy/offline experimentation (with workflow limits)
- NotebookLM (Gemini): best for no-code, multi-doc querying with strong grounding
The core concept underneath all of them is the same: RAG-style retrieval improves reliability when answers must be tied to source documents.
FAQs
With bigger context windows, will RAG become outdated?
No. RAG remains valuable for sensitive/internal documents and controlled retrieval workflows.
What would you choose for day-to-day use: OpenAI, local, or NotebookLM?
Matias’ preference based on the demo: NotebookLM, due to depth and grounding/citations.
How should a data scientist start learning LLMs if they’re new to the field?
Start with prompt engineering fundamentals and learn how prompt quality changes output quality.
Do OpenAI servers receive the entire document?
In a RAG workflow, the model typically receives only relevant chunks, not the full document.
