AI-Powered Document Search with LangChain, ChatD & NotebookLM

Matias Bustamante May 5, 2026 · 6 min read

AI is changing how we interact with information. One of the most practical shifts is how we can now ask questions directly to our documents: PDFs, books, manuals, and other unstructured sources. Without manually searching page by page.

In this webinar, Matias Bustamante walks through a hands-on tour of three different approaches to querying PDFs with AI:

LangChain + OpenAI (Python) using a Retrieval-Augmented Generation (RAG) workflow
ChatD + local LLMs (via Ollama) to query PDFs offline
NotebookLM (Google) powered by Gemini, with a no-code interface and strong citations

🎥 Watch the Full Webinar

Prefer to watch? The complete session is above.
Prefer to read and replicate the implementations? Keep scrolling.

What You’ll Learn

By the end of this guide, you will:

Understand what an LLM is (in simple terms) and why it works
Learn what RAG is and when it improves accuracy
See how indexing → retrieval → generation works in document Q&A
Build a PDF Q&A flow using LangChain + OpenAI
Compare results against local models (Mistral / DeepSeek) using ChatD
Use NotebookLM to query multiple documents with citations, no code required
Understand real-world tradeoffs: accuracy, cost, privacy, and usability

Quick Tour of Recent AI Updates

Before going hands-on, Matias briefly highlights a few AI stories and what they signal:

DeepSeek: positioned as a major launch with claims about efficiency and training improvements
Baidu (Ernie): new versions announced, with a focus on cost and capability improvements
Arizona Supreme Court: experimented with AI avatars to communicate announcements publicly
Italian newspaper: published an edition created with AI support, with journalists curating the output

These examples reinforce the theme of the webinar: AI is moving fast and practical workflows matter.

Two Key Concepts: LLMs and RAG

What Is an LLM?

An LLM (Large Language Model) can be thought of as a system that predicts the next word in a response based on patterns learned from large-scale training data.

The breakthrough that made modern systems fast and effective is transformer-based attention, enabling stronger context handling and faster generation.

Common commercial uses include:

Customer support
Content drafting
Software development assistance (e.g., copilots)

What Is RAG?

RAG (Retrieval-Augmented Generation) is a technique to improve accuracy by giving the model relevant external information at answer time.

This matters when:

Your model’s training data may be outdated
Your knowledge lives in private documents (manuals, internal PDFs, books, policies)

The 3 Steps of RAG

Indexing: split documents into chunks and store them for retrieval
Retrieval: search for the most relevant chunks (often via vector similarity)
Generation: pass the retrieved chunks to the LLM so it answers grounded in the source

Result: more accurate, document-grounded answers.

Approach 1: Query PDFs with LangChain + OpenAI (Python)

This first approach is a classic RAG implementation using Python.

Requirements

Python 3.9
pip
OpenAI API access (paid usage via credits)

What LangChain Does

LangChain is an open-source framework that simplifies building LLM applications, especially for workflows like retrieval, chaining, prompt templates, and evaluation.

The Workflow Used in the Demo

Load the PDF (via a PDF loader)
Split text into chunks (with overlap to avoid losing context)
Create embeddings for chunks
Store vectors in a vector database (Chroma in the demo)
Retrieve relevant chunks for a question
Generate an answer using an LLM (GPT-4 used in the session)

Demo Questions

Matias tests the pipeline with two documents and questions:

Clean Code (Robert C. Martin)
Question: Are comments good or bad in code?
The response reflects the book’s stance (e.g., “comments as a necessary evil” and emphasis on expressive code/refactoring).
A car manual (Chery Tiggo 7 Pro)
Question: What’s the purpose of the auto-hold system?
The answer includes both explanation and operational steps.

Key Note About Privacy / Data Sent

Only relevant chunks are sent to the LLM for answering (not the entire book/PDF).

Pros / Cons (as presented)

Pros

Flexible: you can swap models (OpenAI, Anthropic, Gemini, etc.)
Highly customizable: prompts, multi-question scripts, evaluation
Strong baseline for building real products

Cons

Requires setup + code
Paid API usage
Output formatting and multi-doc UX require extra work

Approach 2: Query PDFs Locally with ChatD + Ollama

This approach uses ChatD (desktop app) plus Ollama to run LLMs locally (e.g., Mistral, DeepSeek).

Requirements

Ollama running locally
ChatD installed
A local model downloaded

Why This Approach Matters

Local setups can be valuable when:

You want to avoid sending data to external APIs
You need offline access
You’re testing multiple models quickly

Demo Comparison

Matias repeats the same questions against different local models.

With Mistral, the “Clean Code” answer is described as more generic/misleading compared to the RAG-grounded response.
With DeepSeek, the answer is more detailed, but still may not quote/ground as well as the first approach.

Limitations Noted

You’re restricted to local models only
Typically works with one document at a time
If you need to query a “library” of documents together, it can feel limiting

Approach 3: Query PDFs with NotebookLM (Gemini, No-Code)

This is the simplest workflow: upload documents into NotebookLM and ask questions through a web UI.

What Makes NotebookLM Stand Out

No local install
No coding
Works well with multi-document “sources”
Shows which chunks were used to answer (strong for trust and verification)
Can generate an audio/podcast-like overview (English-only noted)

Demo Outcome

For both test questions (“Clean Code” and the car manual), NotebookLM produced answers that Matias describes as more in-depth and better grounded than the other options, especially because it surfaces citations/chunk references.

Tradeoffs Mentioned

NotebookLM is a Google experiment and could change or be discontinued
If usage spikes, response time or availability could vary

When Will RAG Still Matter With Huge Context Windows?

A question from the audience: if context windows grow massively, will RAG become obsolete?

Matias’ take: RAG will still matter in cases like:

Sensitive company documents
Internal knowledge that shouldn’t be uploaded
Local/private setups where retrieval pipelines remain useful

Key Takeaways

This webinar shows three practical ways to “chat with your PDFs,” each suited to different needs:

LangChain + OpenAI (Python): best for building robust, productizable pipelines
ChatD + local models: best for privacy/offline experimentation (with workflow limits)
NotebookLM (Gemini): best for no-code, multi-doc querying with strong grounding

The core concept underneath all of them is the same: RAG-style retrieval improves reliability when answers must be tied to source documents.

FAQs

With bigger context windows, will RAG become outdated?

No. RAG remains valuable for sensitive/internal documents and controlled retrieval workflows.

What would you choose for day-to-day use: OpenAI, local, or NotebookLM?

Matias’ preference based on the demo: NotebookLM, due to depth and grounding/citations.

How should a data scientist start learning LLMs if they’re new to the field?

Start with prompt engineering fundamentals and learn how prompt quality changes output quality.

Do OpenAI servers receive the entire document?

In a RAG workflow, the model typically receives only relevant chunks, not the full document.

Ready to build your team in Latin America?

Let us connect you with pre-vetted senior developers who are ready to make an impact.

Get started

Written by Matias Bustamante

AI-Powered Document Search with LangChain, ChatD & NotebookLM

What You’ll Learn

Quick Tour of Recent AI Updates

Two Key Concepts: LLMs and RAG

What Is an LLM?

What Is RAG?

The 3 Steps of RAG

Approach 1: Query PDFs with LangChain + OpenAI (Python)

Requirements

What LangChain Does

The Workflow Used in the Demo

Demo Questions

Key Note About Privacy / Data Sent

Pros / Cons (as presented)

Approach 2: Query PDFs Locally with ChatD + Ollama

Requirements

Why This Approach Matters

Demo Comparison

Limitations Noted

Approach 3: Query PDFs with NotebookLM (Gemini, No-Code)

What Makes NotebookLM Stand Out

Demo Outcome

Tradeoffs Mentioned

When Will RAG Still Matter With Huge Context Windows?

Key Takeaways

FAQs

With bigger context windows, will RAG become outdated?

What would you choose for day-to-day use: OpenAI, local, or NotebookLM?

How should a data scientist start learning LLMs if they’re new to the field?

Do OpenAI servers receive the entire document?

Ready to build your team in Latin America?

Stay updated