March 27, 202611 min read

Python Excel Automation: Read, Write, and Transform Spreadsheets with openpyxl

Automate Excel tasks with Python and openpyxl -- read workbooks, write data, format cells, create charts, and build practical tools like invoice generators.

python excel automation openpyxl tutorial

Every developer eventually gets asked to "do something with a spreadsheet." Maybe it's processing a CSV export from some legacy system, generating monthly reports, or converting data between formats. You could do it manually. Once. But the moment someone says "can you do this every week?" you need automation.

Python's openpyxl library lets you read, write, and modify Excel files (.xlsx) programmatically. No Excel installation required, no COM automation headaches, no VBA. Just Python reading and writing files.

This isn't a reference for every openpyxl method. It's a practical guide -- the operations you'll actually use, the formatting that actually matters, and real examples you can adapt.

Installation

pip install openpyxl

That's it. No system dependencies, no binary builds. Works on every platform.

Reading Excel Files

from openpyxl import load_workbook

# Load a workbook
wb = load_workbook("sales_data.xlsx")

# List all sheet names
print(wb.sheetnames)  # ['Sheet1', 'Q1 Sales', 'Q2 Sales']

# Get the active sheet (the one open when the file was last saved)
ws = wb.active

# Or get a specific sheet
ws = wb["Q1 Sales"]

# Read a single cell
print(ws["A1"].value)       # By cell reference
print(ws.cell(row=1, column=1).value)  # By row/column numbers (1-indexed)

# Read a range of cells
for row in ws.iter_rows(min_row=2, max_row=10, min_col=1, max_col=5, values_only=True):
    print(row)  # Tuple of values: ('Alice', 'Widget', 42, 19.99, 839.58)

The values_only=True parameter gives you just the values instead of cell objects. Use this when you don't need formatting information.

Reading All Data

# Read everything into a list of dictionaries
def read_as_dicts(filepath, sheet_name=None):
    wb = load_workbook(filepath, read_only=True)  # read_only for large files
    ws = wb[sheet_name] if sheet_name else wb.active

rows = ws.iter_rows(values_only=True)
    headers = [str(h).strip() for h in next(rows)]  # First row as headers

data = []
    for row in rows:
        if any(cell is not None for cell in row):  # Skip empty rows
            data.append(dict(zip(headers, row)))

wb.close()
    return data

# Usage
sales = read_as_dicts("sales_data.xlsx")
for record in sales:
    print(f"{record['Name']}: ${record['Total']:.2f}")

Use read_only=True for large files. It uses a streaming reader that doesn't load the entire file into memory.

Writing Excel Files

from openpyxl import Workbook

# Create a new workbook
wb = Workbook()
ws = wb.active
ws.title = "Sales Report"

# Write headers
headers = ["Name", "Product", "Quantity", "Price", "Total"]
ws.append(headers)

# Write data rows
data = [
    ["Alice", "Widget A", 42, 19.99, 839.58],
    ["Bob", "Widget B", 28, 24.99, 699.72],
    ["Carol", "Widget A", 35, 19.99, 699.65],
    ["Dave", "Widget C", 15, 49.99, 749.85],
]

for row in data:
    ws.append(row)

# Save
wb.save("output.xlsx")
print("Saved output.xlsx")

The append() method adds a row at the bottom. It's the easiest way to write data sequentially. For writing to specific cells:

ws["A1"] = "Hello"
ws.cell(row=5, column=3, value=42)

Adding Multiple Sheets

wb = Workbook()

# Rename the default sheet
ws1 = wb.active
ws1.title = "Summary"

# Create additional sheets
ws2 = wb.create_sheet("Raw Data")
ws3 = wb.create_sheet("Charts", 0)  # Insert at position 0 (first tab)

# Write to different sheets
ws1.append(["Metric", "Value"])
ws1.append(["Total Sales", 50000])

ws2.append(["Date", "Product", "Amount"])
ws2.append(["2026-03-01", "Widget A", 1250])

wb.save("multi_sheet.xlsx")

Formatting

This is where openpyxl really shines. You can make your output look like a polished report, not a raw data dump.

Fonts, Colors, and Alignment

from openpyxl.styles import Font, PatternFill, Alignment, Border, Side

# Header styling
header_font = Font(name="Calibri", size=12, bold=True, color="FFFFFF")
header_fill = PatternFill(start_color="2F5496", end_color="2F5496", fill_type="solid")
header_alignment = Alignment(horizontal="center", vertical="center")

# Apply to header row
for cell in ws[1]:
    cell.font = header_font
    cell.fill = header_fill
    cell.alignment = header_alignment

# Number formatting
for row in ws.iter_rows(min_row=2, min_col=4, max_col=5):
    for cell in row:
        cell.number_format = '$#,##0.00'

# Date formatting
ws["A2"].number_format = 'YYYY-MM-DD'

# Percentage formatting
ws["F2"].number_format = '0.0%'

Borders

thin_border = Border(
    left=Side(style="thin"),
    right=Side(style="thin"),
    top=Side(style="thin"),
    bottom=Side(style="thin")
)

# Apply border to all data cells
for row in ws.iter_rows(min_row=1, max_row=ws.max_row, max_col=ws.max_column):
    for cell in row:
        cell.border = thin_border

Column Width and Row Height

# Set column widths
ws.column_dimensions["A"].width = 20
ws.column_dimensions["B"].width = 15
ws.column_dimensions["C"].width = 12

# Auto-fit column width (approximate)
def auto_fit_columns(ws):
    for column_cells in ws.columns:
        max_length = 0
        column_letter = column_cells[0].column_letter
        for cell in column_cells:
            if cell.value:
                max_length = max(max_length, len(str(cell.value)))
        ws.column_dimensions[column_letter].width = min(max_length + 2, 50)

auto_fit_columns(ws)

# Set row height
ws.row_dimensions[1].height = 30  # Header row

Conditional Formatting

from openpyxl.formatting.rule import CellIsRule, ColorScaleRule

# Highlight cells above a threshold
red_fill = PatternFill(start_color="FFC7CE", end_color="FFC7CE", fill_type="solid")
green_fill = PatternFill(start_color="C6EFCE", end_color="C6EFCE", fill_type="solid")

ws.conditional_formatting.add(
    "E2:E100",
    CellIsRule(operator="greaterThan", formula=["1000"], fill=green_fill)
)

ws.conditional_formatting.add(
    "E2:E100",
    CellIsRule(operator="lessThan", formula=["500"], fill=red_fill)
)

# Color scale (gradient from red to green)
ws.conditional_formatting.add(
    "C2:C100",
    ColorScaleRule(
        start_type="min", start_color="F8696B",
        mid_type="percentile", mid_value=50, mid_color="FFEB84",
        end_type="max", end_color="63BE7B"
    )
)

Formulas

openpyxl writes formula strings into cells. Excel evaluates them when the file is opened:

# SUM formula
ws["E6"] = "=SUM(E2:E5)"

# AVERAGE
ws["E7"] = "=AVERAGE(E2:E5)"

# IF statement
ws["F2"] = '=IF(E2>700,"High","Low")'

# VLOOKUP
ws["G2"] = '=VLOOKUP(A2,Sheet2!A:B,2,FALSE)'

# Dynamic row reference
for row_num in range(2, ws.max_row + 1):
    ws.cell(row=row_num, column=5).value = f"=C{row_num}*D{row_num}"

Note: openpyxl doesn't evaluate formulas. If you need computed values without opening in Excel, calculate them in Python.

Merging Cells

# Merge cells for a title
ws.merge_cells("A1:E1")
ws["A1"] = "Monthly Sales Report - March 2026"
ws["A1"].font = Font(size=16, bold=True)
ws["A1"].alignment = Alignment(horizontal="center")

# Merge vertically
ws.merge_cells("A2:A5")
ws["A2"] = "Q1"
ws["A2"].alignment = Alignment(vertical="center")

# Unmerge
ws.unmerge_cells("A1:E1")

Charts

from openpyxl.chart import BarChart, PieChart, LineChart, Reference

# Bar chart
chart = BarChart()
chart.type = "col"
chart.title = "Sales by Product"
chart.x_axis.title = "Product"
chart.y_axis.title = "Revenue ($)"
chart.style = 10

data = Reference(ws, min_col=5, min_row=1, max_row=5)  # Total column
categories = Reference(ws, min_col=2, min_row=2, max_row=5)  # Product names

chart.add_data(data, titles_from_data=True)
chart.set_categories(categories)
chart.width = 18
chart.height = 12

ws.add_chart(chart, "G2")  # Place chart at G2

# Pie chart
pie = PieChart()
pie.title = "Sales Distribution"
pie.add_data(data, titles_from_data=True)
pie.set_categories(categories)
ws.add_chart(pie, "G18")

# Line chart (for time series)
line = LineChart()
line.title = "Monthly Trend"
line.x_axis.title = "Month"
line.y_axis.title = "Sales"
line_data = Reference(ws, min_col=3, min_row=1, max_row=13)
line.add_data(line_data, titles_from_data=True)
ws.add_chart(line, "G34")

Practical Example: Invoice Generator

Here's a complete, real-world example -- generating invoices from a list of orders:

from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment, Border, Side, numbers
from datetime import datetime

def generate_invoice(invoice_data: dict, filename: str):
    wb = Workbook()
    ws = wb.active
    ws.title = "Invoice"

# Page setup
    ws.sheet_properties.pageSetUpPr = None
    ws.page_margins.left = 0.5
    ws.page_margins.right = 0.5

# Styles
    title_font = Font(name="Calibri", size=24, bold=True, color="2F5496")
    header_font = Font(name="Calibri", size=11, bold=True, color="FFFFFF")
    header_fill = PatternFill(start_color="2F5496", end_color="2F5496", fill_type="solid")
    currency_format = '$#,##0.00'
    thin_border = Border(
        left=Side(style="thin", color="D9D9D9"),
        right=Side(style="thin", color="D9D9D9"),
        top=Side(style="thin", color="D9D9D9"),
        bottom=Side(style="thin", color="D9D9D9"),
    )

# Column widths
    ws.column_dimensions["A"].width = 8
    ws.column_dimensions["B"].width = 35
    ws.column_dimensions["C"].width = 12
    ws.column_dimensions["D"].width = 15
    ws.column_dimensions["E"].width = 15

# Company header
    ws.merge_cells("A1:E1")
    ws["A1"] = invoice_data["company_name"]
    ws["A1"].font = title_font

ws.merge_cells("A2:E2")
    ws["A2"] = invoice_data["company_address"]
    ws["A2"].font = Font(size=10, color="666666")

# Invoice details
    row = 4
    ws[f"A{row}"] = "Invoice #:"
    ws[f"A{row}"].font = Font(bold=True)
    ws[f"B{row}"] = invoice_data["invoice_number"]

ws[f"D{row}"] = "Date:"
    ws[f"D{row}"].font = Font(bold=True)
    ws[f"E{row}"] = invoice_data["date"]

row = 5
    ws[f"A{row}"] = "Bill To:"
    ws[f"A{row}"].font = Font(bold=True)
    ws[f"B{row}"] = invoice_data["client_name"]

ws[f"D{row}"] = "Due Date:"
    ws[f"D{row}"].font = Font(bold=True)
    ws[f"E{row}"] = invoice_data["due_date"]

ws[f"B6"] = invoice_data["client_address"]
    ws[f"B6"].font = Font(color="666666")

# Line items header
    row = 8
    headers = ["#", "Description", "Qty", "Unit Price", "Total"]
    for col, header in enumerate(headers, 1):
        cell = ws.cell(row=row, column=col, value=header)
        cell.font = header_font
        cell.fill = header_fill
        cell.alignment = Alignment(horizontal="center")

# Line items
    subtotal = 0
    for i, item in enumerate(invoice_data["items"], 1):
        row += 1
        total = item["quantity"] * item["unit_price"]
        subtotal += total

ws.cell(row=row, column=1, value=i).alignment = Alignment(horizontal="center")
        ws.cell(row=row, column=2, value=item["description"])
        ws.cell(row=row, column=3, value=item["quantity"]).alignment = Alignment(horizontal="center")
        ws.cell(row=row, column=4, value=item["unit_price"]).number_format = currency_format
        ws.cell(row=row, column=5, value=total).number_format = currency_format

for col in range(1, 6):
            ws.cell(row=row, column=col).border = thin_border

# Totals
    row += 2
    ws.cell(row=row, column=4, value="Subtotal:").font = Font(bold=True)
    ws.cell(row=row, column=5, value=subtotal).number_format = currency_format

tax_rate = invoice_data.get("tax_rate", 0.0)
    tax_amount = subtotal * tax_rate
    row += 1
    ws.cell(row=row, column=4, value=f"Tax ({tax_rate*100:.0f}%):").font = Font(bold=True)
    ws.cell(row=row, column=5, value=tax_amount).number_format = currency_format

row += 1
    ws.cell(row=row, column=4, value="TOTAL:").font = Font(size=14, bold=True, color="2F5496")
    total_cell = ws.cell(row=row, column=5, value=subtotal + tax_amount)
    total_cell.number_format = currency_format
    total_cell.font = Font(size=14, bold=True, color="2F5496")

# Notes
    row += 3
    ws.merge_cells(f"A{row}:E{row}")
    ws[f"A{row}"] = f"Notes: {invoice_data.get('notes', 'Payment due within 30 days.')}"
    ws[f"A{row}"].font = Font(italic=True, color="888888")

wb.save(filename)
    print(f"Invoice saved: {filename}")

# Generate an invoice
invoice = {
    "company_name": "Acme Corp",
    "company_address": "123 Business Ave, Tech City, TC 10001",
    "invoice_number": "INV-2026-0042",
    "date": "2026-03-27",
    "due_date": "2026-04-26",
    "client_name": "Widget Industries",
    "client_address": "456 Client St, Commerce Town, CT 20002",
    "tax_rate": 0.08,
    "notes": "Thank you for your business! Payment due within 30 days.",
    "items": [
        {"description": "Web Development - Homepage Redesign", "quantity": 1, "unit_price": 3500.00},
        {"description": "UI/UX Consultation (hours)", "quantity": 12, "unit_price": 150.00},
        {"description": "Hosting Setup & Configuration", "quantity": 1, "unit_price": 500.00},
        {"description": "Monthly Maintenance (prepaid)", "quantity": 3, "unit_price": 200.00},
    ]
}

generate_invoice(invoice, "invoice_INV-2026-0042.xlsx")

Practical Example: Bulk Data Processing

Process a large data file, clean it, and generate a summary report:

from openpyxl import load_workbook, Workbook
from collections import defaultdict

def process_sales_report(input_file: str, output_file: str):
    # Read raw data
    wb_in = load_workbook(input_file, read_only=True)
    ws_in = wb_in.active

rows = ws_in.iter_rows(values_only=True)
    headers = next(rows)

# Process and aggregate
    sales_by_region = defaultdict(float)
    sales_by_product = defaultdict(float)
    monthly_totals = defaultdict(float)
    total_records = 0
    skipped = 0

for row in rows:
        record = dict(zip(headers, row))

# Skip incomplete records
        if not record.get("Region") or not record.get("Amount"):
            skipped += 1
            continue

amount = float(record["Amount"])
        sales_by_region[record["Region"]] += amount
        sales_by_product[record["Product"]] += amount

if record.get("Date"):
            month = str(record["Date"])[:7]  # YYYY-MM
            monthly_totals[month] += amount

total_records += 1

wb_in.close()

# Write summary report
    wb_out = Workbook()

# Regional summary
    ws_region = wb_out.active
    ws_region.title = "By Region"
    ws_region.append(["Region", "Total Sales"])
    for region, total in sorted(sales_by_region.items(), key=lambda x: -x[1]):
        ws_region.append([region, total])

# Format
    for row in ws_region.iter_rows(min_row=2, min_col=2, max_col=2):
        for cell in row:
            cell.number_format = '$#,##0.00'

# Product summary
    ws_product = wb_out.create_sheet("By Product")
    ws_product.append(["Product", "Total Sales"])
    for product, total in sorted(sales_by_product.items(), key=lambda x: -x[1]):
        ws_product.append([product, total])

# Monthly trend
    ws_monthly = wb_out.create_sheet("Monthly Trend")
    ws_monthly.append(["Month", "Total Sales"])
    for month, total in sorted(monthly_totals.items()):
        ws_monthly.append([month, total])

# Overview
    ws_overview = wb_out.create_sheet("Overview", 0)
    ws_overview.append(["Metric", "Value"])
    ws_overview.append(["Total Records Processed", total_records])
    ws_overview.append(["Records Skipped", skipped])
    ws_overview.append(["Total Revenue", sum(sales_by_region.values())])
    ws_overview.append(["Number of Regions", len(sales_by_region)])
    ws_overview.append(["Number of Products", len(sales_by_product)])

wb_out.save(output_file)
    print(f"Report saved: {output_file}")
    print(f"Processed {total_records} records, skipped {skipped}")

process_sales_report("raw_sales.xlsx", "sales_summary.xlsx")

Modifying Existing Files

Sometimes you need to update an existing spreadsheet without recreating it:

wb = load_workbook("existing_report.xlsx")
ws = wb["Data"]

# Update a specific cell
ws["E2"] = 999.99

# Add a new column
ws.cell(row=1, column=ws.max_column + 1, value="Status")
for row in range(2, ws.max_row + 1):
    total = ws.cell(row=row, column=5).value
    if total and total > 1000:
        ws.cell(row=row, column=ws.max_column, value="High Value")
    else:
        ws.cell(row=row, column=ws.max_column, value="Standard")

# Delete a row
ws.delete_rows(5)

# Insert rows
ws.insert_rows(2, amount=3)  # Insert 3 rows at position 2

wb.save("existing_report.xlsx")

Freeze Panes and Filters

# Freeze the header row (scroll data while headers stay visible)
ws.freeze_panes = "A2"

# Freeze first column and header row
ws.freeze_panes = "B2"

# Add auto-filter (the dropdown arrows on headers)
ws.auto_filter.ref = f"A1:E{ws.max_row}"

Common Mistakes

Forgetting to save. wb.save() is not automatic. If your script crashes before saving, you lose everything. Save early and often, or use try/finally. Modifying cells in read_only mode. If you open with read_only=True, you can't write. It's a common source of confusing errors. Off-by-one errors. openpyxl is 1-indexed (row 1, column 1), not 0-indexed like Python lists. This trips up everyone at least once. Not closing workbooks. With read_only=True and write_only=True modes, you must call wb.close() to release file handles. Use context managers or try/finally. Overwriting formulas. If you read a cell that contains a formula, you get the formula string, not the computed value. If you write it back, that's fine. But if you overwrite it with a plain value, the formula is gone. Use data_only=True in load_workbook() to read computed values instead of formulas (note: only works if the file was last saved by Excel).

What's Next

openpyxl covers most Excel automation needs, but there are adjacent tools worth knowing:

pandas -- For heavy data analysis, read Excel with pd.read_excel() and write with df.to_excel(). Uses openpyxl under the hood.
xlsxwriter -- Write-only library with better chart support and performance for large files.
python-pptx -- Automate PowerPoint, often used alongside Excel automation.
Schedule / cron -- Run your scripts on a schedule for recurring reports.
Email integration -- Use smtplib to email generated reports automatically.

Start with a repetitive spreadsheet task you do manually. Automate it. Then watch the time savings compound every week.

For more Python automation tutorials, check out CodeUp.