March 27, 20268 min read

LangChain Tutorial for Beginners: Build AI Apps Step by Step

A practical LangChain tutorial covering LLMs, chains, tools, agents, and RAG. Build a Q&A bot over your own documents with complete, runnable Python code examples.

langchain python ai rag llm tutorial

LangChain is the most popular framework for building applications with LLMs. It's also the most divisive. Some developers love the abstractions. Others find them frustrating. Both groups have valid points.

Here's the thing -- LangChain does three things well: it gives you a unified interface across LLM providers, it has excellent document loading and retrieval tools, and it handles the plumbing of chaining LLM calls together. Whether you need all of that depends on what you're building.

This tutorial covers the core concepts with real code. By the end, you'll have a working Q&A bot that answers questions about your own documents.

Setup

pip install langchain langchain-anthropic langchain-community langchain-chroma
pip install chromadb pypdf

Set your API key:

export ANTHROPIC_API_KEY="sk-ant-..."

1. LLMs and Chat Models

The foundation. LangChain wraps multiple LLM providers behind a consistent interface:

from langchain_anthropic import ChatAnthropic
from langchain_core.messages import HumanMessage, SystemMessage

# Create a model
llm = ChatAnthropic(model="claude-sonnet-4-20250514", temperature=0)

# Simple invocation
response = llm.invoke("Explain Python decorators in two sentences.")
print(response.content)

# With system prompt and structured messages
messages = [
    SystemMessage(content="You are a senior Python developer. Be concise."),
    HumanMessage(content="When should I use dataclasses vs Pydantic?"),
]
response = llm.invoke(messages)
print(response.content)

Switching to OpenAI? Change the import and model name. Everything else stays the same:

from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o", temperature=0)

That provider-agnostic interface is genuinely useful. You can swap models without rewriting your application logic.

2. Prompt Templates

Hardcoded prompts don't scale. Templates let you parameterize them:

from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an expert in {language}. Answer concisely."),
    ("human", "{question}"),
])

# Format and invoke
chain = prompt | llm
response = chain.invoke({
    "language": "Rust",
    "question": "What's the difference between String and &str?",
})
print(response.content)

The | pipe operator creates a chain. Data flows left to right: template formats the prompt, model generates the response. This is LangChain Expression Language (LCEL), and it's the modern way to compose LangChain components.

3. Output Parsers

LLMs return text. Often you need structured data:

from langchain_core.output_parsers import JsonOutputParser
from pydantic import BaseModel, Field

class CodeReview(BaseModel):
    issues: list[str] = Field(description="List of issues found")
    severity: str = Field(description="Overall severity: low, medium, or high")
    suggestion: str = Field(description="Main improvement suggestion")

parser = JsonOutputParser(pydantic_object=CodeReview)

prompt = ChatPromptTemplate.from_messages([
    ("system", "Review the following code and respond in JSON.\n{format_instructions}"),
    ("human", "{code}"),
])

chain = prompt | llm | parser

result = chain.invoke({
    "code": "def get_user(id): return db.query(f'SELECT * FROM users WHERE id = {id}')",
    "format_instructions": parser.get_format_instructions(),
})

print(result)
# {'issues': ['SQL injection vulnerability', ...], 'severity': 'high', ...}

4. Chains

Chains connect multiple steps. The simplest pattern is sequential -- output of one step feeds into the next:

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# Step 1: Generate code
code_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a Python developer. Write only code, no explanation."),
    ("human", "Write a function that {task}"),
])

# Step 2: Review the code
review_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a code reviewer. Be specific and constructive."),
    ("human", "Review this Python code:\n\n{code}"),
])

# Chain them together
code_chain = code_prompt | llm | StrOutputParser()
review_chain = review_prompt | llm | StrOutputParser()

# Run sequentially
from langchain_core.runnables import RunnablePassthrough

full_chain = (
    {"task": RunnablePassthrough()}
    | code_chain
    | (lambda code: {"code": code})
    | review_chain
)

review = full_chain.invoke("sorts a list of dictionaries by a nested key")
print(review)

Let's be honest -- this is where LangChain's abstraction starts to feel heavy. For simple sequential calls, you might prefer just calling the LLM twice with plain Python. But for complex branching and parallel chains, LCEL earns its keep.

5. Tools and Agents

Agents use tools to interact with the outside world. This is where LLMs go from "answering questions" to "doing things":

from langchain_core.tools import tool
from langchain.agents import AgentExecutor, create_tool_calling_agent

@tool
def search_documentation(query: str) -> str:
    """Search the project documentation for relevant information.
    Use this when you need to find specific technical details."""
    # In production, this would search a real doc index
    docs = {
        "authentication": "Use JWT tokens. Set Authorization header to 'Bearer <token>'.",
        "rate limits": "100 requests/minute for free tier, 1000 for paid.",
        "pagination": "Use cursor-based pagination. Pass 'cursor' param for next page.",
    }
    for key, value in docs.items():
        if key in query.lower():
            return value
    return "No relevant documentation found."

@tool
def run_code(code: str) -> str:
    """Execute Python code and return the output. Use for calculations or data processing."""
    import io, contextlib
    output = io.StringIO()
    try:
        with contextlib.redirect_stdout(output):
            exec(code, {"__builtins__": __builtins__})
        return output.getvalue() or "Code executed successfully (no output)"
    except Exception as e:
        return f"Error: {e}"

tools = [search_documentation, run_code]

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a developer assistant. Use tools to find information and run code when needed."),
    ("human", "{input}"),
    ("placeholder", "{agent_scratchpad}"),
])

agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

result = executor.invoke({
    "input": "What are the rate limits, and calculate how many requests I can make in 8 hours on the free tier?"
})
print(result["output"])

The agent will search the docs for rate limit info, then use the code tool to calculate 100 60 8 = 48,000 requests. Two tool calls, one coherent answer.

6. RAG: Retrieval-Augmented Generation

This is LangChain's strongest use case. RAG lets you ask questions about your own documents -- PDFs, code files, web pages, whatever:

from langchain_community.document_loaders import PyPDFLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_anthropic import ChatAnthropic
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Step 1: Load documents
loader = DirectoryLoader("./docs", glob="*/.pdf", loader_cls=PyPDFLoader)
documents = loader.load()
print(f"Loaded {len(documents)} pages")

# Step 2: Split into chunks
splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,      # Characters per chunk
    chunk_overlap=200,    # Overlap between chunks for context continuity
    separators=["\n\n", "\n", ". ", " ", ""],
)
chunks = splitter.split_documents(documents)
print(f"Split into {len(chunks)} chunks")

# Step 3: Create embeddings and store in vector database
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db",
)
print("Vector store created")

# Step 4: Create retriever
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 5},  # Return top 5 most relevant chunks
)

# Step 5: Build the RAG chain
rag_prompt = ChatPromptTemplate.from_messages([
    ("system", """Answer the question based on the provided context.
If the context doesn't contain enough information, say so honestly.
Always cite which document the information came from.

Context:
{context}"""),
    ("human", "{question}"),
])

def format_docs(docs):
    return "\n\n---\n\n".join(
        f"Source: {doc.metadata.get('source', 'Unknown')}\n{doc.page_content}"
        for doc in docs
    )

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt
    | llm
    | StrOutputParser()
)

# Use it
answer = rag_chain.invoke("What is the refund policy for enterprise customers?")
print(answer)

This is genuinely powerful. You load PDFs, split them into searchable chunks, embed them into a vector database, and now you can ask natural language questions that get answered using your actual documents. The model cites sources, handles ambiguity, and combines information from multiple documents.

Putting It All Together: A Document Q&A Bot

Here's a complete, runnable script that ties everything together:

# qa_bot.py
import os
from langchain_anthropic import ChatAnthropic
from langchain_community.document_loaders import DirectoryLoader, TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

def build_qa_bot(docs_directory: str):
    """Build a Q&A bot over a directory of documents."""

# Load and chunk
    loader = DirectoryLoader(docs_directory, glob="*/.txt", loader_cls=TextLoader)
    docs = loader.load()
    splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    chunks = splitter.split_documents(docs)

# Embed and store
    embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
    vectorstore = Chroma.from_documents(chunks, embeddings)
    retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

# Build chain
    llm = ChatAnthropic(model="claude-sonnet-4-20250514", temperature=0)

prompt = ChatPromptTemplate.from_messages([
        ("system", "Answer based on the context. Be specific and cite sources.\n\nContext:\n{context}"),
        ("human", "{question}"),
    ])

def format_docs(docs):
        return "\n\n".join(f"[{doc.metadata['source']}]\n{doc.page_content}" for doc in docs)

chain = (
        {"context": retriever | format_docs, "question": RunnablePassthrough()}
        | prompt
        | llm
        | StrOutputParser()
    )

return chain

if __name__ == "__main__":
    bot = build_qa_bot("./my_docs")

while True:
        question = input("\nAsk a question (or 'quit'): ")
        if question.lower() == "quit":
            break
        answer = bot.invoke(question)
        print(f"\n{answer}")

When LangChain Is Overkill

Let's be real. LangChain adds a lot of abstraction, and sometimes that abstraction gets in the way.

You probably don't need LangChain if:

You're making simple LLM calls (just use the provider's SDK directly)
You have 1-2 tools (write the agent loop yourself -- it's 50 lines)
You need full control over every API call
You're debugging and the abstraction layers are hiding the problem

LangChain earns its complexity when:

You need RAG over documents (the document loaders and splitters are excellent)
You're switching between LLM providers frequently
You need pre-built integrations (there are hundreds)
You're building complex multi-step chains with branching

Consider LangGraph instead when:

You need cycles in your graph (agent loops with conditional routing)
You need persistent state across steps
You're building anything that resembles a state machine

The LangChain ecosystem is massive and moves fast. Don't try to learn everything at once. Start with the basics -- model, prompt template, chain -- and add complexity only when your use case demands it.

For more hands-on AI development tutorials, check out CodeUp.