Python Excel Automation: Read, Write, and Transform Spreadsheets with openpyxl
Automate Excel tasks with Python and openpyxl -- read workbooks, write data, format cells, create charts, and build practical tools like invoice generators.
Every developer eventually gets asked to "do something with a spreadsheet." Maybe it's processing a CSV export from some legacy system, generating monthly reports, or converting data between formats. You could do it manually. Once. But the moment someone says "can you do this every week?" you need automation.
Python's openpyxl library lets you read, write, and modify Excel files (.xlsx) programmatically. No Excel installation required, no COM automation headaches, no VBA. Just Python reading and writing files.
This isn't a reference for every openpyxl method. It's a practical guide -- the operations you'll actually use, the formatting that actually matters, and real examples you can adapt.
Installation
pip install openpyxl
That's it. No system dependencies, no binary builds. Works on every platform.
Reading Excel Files
from openpyxl import load_workbook
# Load a workbook
wb = load_workbook("sales_data.xlsx")
# List all sheet names
print(wb.sheetnames) # ['Sheet1', 'Q1 Sales', 'Q2 Sales']
# Get the active sheet (the one open when the file was last saved)
ws = wb.active
# Or get a specific sheet
ws = wb["Q1 Sales"]
# Read a single cell
print(ws["A1"].value) # By cell reference
print(ws.cell(row=1, column=1).value) # By row/column numbers (1-indexed)
# Read a range of cells
for row in ws.iter_rows(min_row=2, max_row=10, min_col=1, max_col=5, values_only=True):
print(row) # Tuple of values: ('Alice', 'Widget', 42, 19.99, 839.58)
The values_only=True parameter gives you just the values instead of cell objects. Use this when you don't need formatting information.
Reading All Data
# Read everything into a list of dictionaries
def read_as_dicts(filepath, sheet_name=None):
wb = load_workbook(filepath, read_only=True) # read_only for large files
ws = wb[sheet_name] if sheet_name else wb.active
rows = ws.iter_rows(values_only=True)
headers = [str(h).strip() for h in next(rows)] # First row as headers
data = []
for row in rows:
if any(cell is not None for cell in row): # Skip empty rows
data.append(dict(zip(headers, row)))
wb.close()
return data
# Usage
sales = read_as_dicts("sales_data.xlsx")
for record in sales:
print(f"{record['Name']}: ${record['Total']:.2f}")
Use read_only=True for large files. It uses a streaming reader that doesn't load the entire file into memory.
Writing Excel Files
from openpyxl import Workbook
# Create a new workbook
wb = Workbook()
ws = wb.active
ws.title = "Sales Report"
# Write headers
headers = ["Name", "Product", "Quantity", "Price", "Total"]
ws.append(headers)
# Write data rows
data = [
["Alice", "Widget A", 42, 19.99, 839.58],
["Bob", "Widget B", 28, 24.99, 699.72],
["Carol", "Widget A", 35, 19.99, 699.65],
["Dave", "Widget C", 15, 49.99, 749.85],
]
for row in data:
ws.append(row)
# Save
wb.save("output.xlsx")
print("Saved output.xlsx")
The append() method adds a row at the bottom. It's the easiest way to write data sequentially. For writing to specific cells:
ws["A1"] = "Hello"
ws.cell(row=5, column=3, value=42)
Adding Multiple Sheets
wb = Workbook()
# Rename the default sheet
ws1 = wb.active
ws1.title = "Summary"
# Create additional sheets
ws2 = wb.create_sheet("Raw Data")
ws3 = wb.create_sheet("Charts", 0) # Insert at position 0 (first tab)
# Write to different sheets
ws1.append(["Metric", "Value"])
ws1.append(["Total Sales", 50000])
ws2.append(["Date", "Product", "Amount"])
ws2.append(["2026-03-01", "Widget A", 1250])
wb.save("multi_sheet.xlsx")
Formatting
This is where openpyxl really shines. You can make your output look like a polished report, not a raw data dump.
Fonts, Colors, and Alignment
from openpyxl.styles import Font, PatternFill, Alignment, Border, Side
# Header styling
header_font = Font(name="Calibri", size=12, bold=True, color="FFFFFF")
header_fill = PatternFill(start_color="2F5496", end_color="2F5496", fill_type="solid")
header_alignment = Alignment(horizontal="center", vertical="center")
# Apply to header row
for cell in ws[1]:
cell.font = header_font
cell.fill = header_fill
cell.alignment = header_alignment
# Number formatting
for row in ws.iter_rows(min_row=2, min_col=4, max_col=5):
for cell in row:
cell.number_format = '$#,##0.00'
# Date formatting
ws["A2"].number_format = 'YYYY-MM-DD'
# Percentage formatting
ws["F2"].number_format = '0.0%'
Borders
thin_border = Border(
left=Side(style="thin"),
right=Side(style="thin"),
top=Side(style="thin"),
bottom=Side(style="thin")
)
# Apply border to all data cells
for row in ws.iter_rows(min_row=1, max_row=ws.max_row, max_col=ws.max_column):
for cell in row:
cell.border = thin_border
Column Width and Row Height
# Set column widths
ws.column_dimensions["A"].width = 20
ws.column_dimensions["B"].width = 15
ws.column_dimensions["C"].width = 12
# Auto-fit column width (approximate)
def auto_fit_columns(ws):
for column_cells in ws.columns:
max_length = 0
column_letter = column_cells[0].column_letter
for cell in column_cells:
if cell.value:
max_length = max(max_length, len(str(cell.value)))
ws.column_dimensions[column_letter].width = min(max_length + 2, 50)
auto_fit_columns(ws)
# Set row height
ws.row_dimensions[1].height = 30 # Header row
Conditional Formatting
from openpyxl.formatting.rule import CellIsRule, ColorScaleRule
# Highlight cells above a threshold
red_fill = PatternFill(start_color="FFC7CE", end_color="FFC7CE", fill_type="solid")
green_fill = PatternFill(start_color="C6EFCE", end_color="C6EFCE", fill_type="solid")
ws.conditional_formatting.add(
"E2:E100",
CellIsRule(operator="greaterThan", formula=["1000"], fill=green_fill)
)
ws.conditional_formatting.add(
"E2:E100",
CellIsRule(operator="lessThan", formula=["500"], fill=red_fill)
)
# Color scale (gradient from red to green)
ws.conditional_formatting.add(
"C2:C100",
ColorScaleRule(
start_type="min", start_color="F8696B",
mid_type="percentile", mid_value=50, mid_color="FFEB84",
end_type="max", end_color="63BE7B"
)
)
Formulas
openpyxl writes formula strings into cells. Excel evaluates them when the file is opened:
# SUM formula
ws["E6"] = "=SUM(E2:E5)"
# AVERAGE
ws["E7"] = "=AVERAGE(E2:E5)"
# IF statement
ws["F2"] = '=IF(E2>700,"High","Low")'
# VLOOKUP
ws["G2"] = '=VLOOKUP(A2,Sheet2!A:B,2,FALSE)'
# Dynamic row reference
for row_num in range(2, ws.max_row + 1):
ws.cell(row=row_num, column=5).value = f"=C{row_num}*D{row_num}"
Note: openpyxl doesn't evaluate formulas. If you need computed values without opening in Excel, calculate them in Python.
Merging Cells
# Merge cells for a title
ws.merge_cells("A1:E1")
ws["A1"] = "Monthly Sales Report - March 2026"
ws["A1"].font = Font(size=16, bold=True)
ws["A1"].alignment = Alignment(horizontal="center")
# Merge vertically
ws.merge_cells("A2:A5")
ws["A2"] = "Q1"
ws["A2"].alignment = Alignment(vertical="center")
# Unmerge
ws.unmerge_cells("A1:E1")
Charts
from openpyxl.chart import BarChart, PieChart, LineChart, Reference
# Bar chart
chart = BarChart()
chart.type = "col"
chart.title = "Sales by Product"
chart.x_axis.title = "Product"
chart.y_axis.title = "Revenue ($)"
chart.style = 10
data = Reference(ws, min_col=5, min_row=1, max_row=5) # Total column
categories = Reference(ws, min_col=2, min_row=2, max_row=5) # Product names
chart.add_data(data, titles_from_data=True)
chart.set_categories(categories)
chart.width = 18
chart.height = 12
ws.add_chart(chart, "G2") # Place chart at G2
# Pie chart
pie = PieChart()
pie.title = "Sales Distribution"
pie.add_data(data, titles_from_data=True)
pie.set_categories(categories)
ws.add_chart(pie, "G18")
# Line chart (for time series)
line = LineChart()
line.title = "Monthly Trend"
line.x_axis.title = "Month"
line.y_axis.title = "Sales"
line_data = Reference(ws, min_col=3, min_row=1, max_row=13)
line.add_data(line_data, titles_from_data=True)
ws.add_chart(line, "G34")
Practical Example: Invoice Generator
Here's a complete, real-world example -- generating invoices from a list of orders:
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment, Border, Side, numbers
from datetime import datetime
def generate_invoice(invoice_data: dict, filename: str):
wb = Workbook()
ws = wb.active
ws.title = "Invoice"
# Page setup
ws.sheet_properties.pageSetUpPr = None
ws.page_margins.left = 0.5
ws.page_margins.right = 0.5
# Styles
title_font = Font(name="Calibri", size=24, bold=True, color="2F5496")
header_font = Font(name="Calibri", size=11, bold=True, color="FFFFFF")
header_fill = PatternFill(start_color="2F5496", end_color="2F5496", fill_type="solid")
currency_format = '$#,##0.00'
thin_border = Border(
left=Side(style="thin", color="D9D9D9"),
right=Side(style="thin", color="D9D9D9"),
top=Side(style="thin", color="D9D9D9"),
bottom=Side(style="thin", color="D9D9D9"),
)
# Column widths
ws.column_dimensions["A"].width = 8
ws.column_dimensions["B"].width = 35
ws.column_dimensions["C"].width = 12
ws.column_dimensions["D"].width = 15
ws.column_dimensions["E"].width = 15
# Company header
ws.merge_cells("A1:E1")
ws["A1"] = invoice_data["company_name"]
ws["A1"].font = title_font
ws.merge_cells("A2:E2")
ws["A2"] = invoice_data["company_address"]
ws["A2"].font = Font(size=10, color="666666")
# Invoice details
row = 4
ws[f"A{row}"] = "Invoice #:"
ws[f"A{row}"].font = Font(bold=True)
ws[f"B{row}"] = invoice_data["invoice_number"]
ws[f"D{row}"] = "Date:"
ws[f"D{row}"].font = Font(bold=True)
ws[f"E{row}"] = invoice_data["date"]
row = 5
ws[f"A{row}"] = "Bill To:"
ws[f"A{row}"].font = Font(bold=True)
ws[f"B{row}"] = invoice_data["client_name"]
ws[f"D{row}"] = "Due Date:"
ws[f"D{row}"].font = Font(bold=True)
ws[f"E{row}"] = invoice_data["due_date"]
ws[f"B6"] = invoice_data["client_address"]
ws[f"B6"].font = Font(color="666666")
# Line items header
row = 8
headers = ["#", "Description", "Qty", "Unit Price", "Total"]
for col, header in enumerate(headers, 1):
cell = ws.cell(row=row, column=col, value=header)
cell.font = header_font
cell.fill = header_fill
cell.alignment = Alignment(horizontal="center")
# Line items
subtotal = 0
for i, item in enumerate(invoice_data["items"], 1):
row += 1
total = item["quantity"] * item["unit_price"]
subtotal += total
ws.cell(row=row, column=1, value=i).alignment = Alignment(horizontal="center")
ws.cell(row=row, column=2, value=item["description"])
ws.cell(row=row, column=3, value=item["quantity"]).alignment = Alignment(horizontal="center")
ws.cell(row=row, column=4, value=item["unit_price"]).number_format = currency_format
ws.cell(row=row, column=5, value=total).number_format = currency_format
for col in range(1, 6):
ws.cell(row=row, column=col).border = thin_border
# Totals
row += 2
ws.cell(row=row, column=4, value="Subtotal:").font = Font(bold=True)
ws.cell(row=row, column=5, value=subtotal).number_format = currency_format
tax_rate = invoice_data.get("tax_rate", 0.0)
tax_amount = subtotal * tax_rate
row += 1
ws.cell(row=row, column=4, value=f"Tax ({tax_rate*100:.0f}%):").font = Font(bold=True)
ws.cell(row=row, column=5, value=tax_amount).number_format = currency_format
row += 1
ws.cell(row=row, column=4, value="TOTAL:").font = Font(size=14, bold=True, color="2F5496")
total_cell = ws.cell(row=row, column=5, value=subtotal + tax_amount)
total_cell.number_format = currency_format
total_cell.font = Font(size=14, bold=True, color="2F5496")
# Notes
row += 3
ws.merge_cells(f"A{row}:E{row}")
ws[f"A{row}"] = f"Notes: {invoice_data.get('notes', 'Payment due within 30 days.')}"
ws[f"A{row}"].font = Font(italic=True, color="888888")
wb.save(filename)
print(f"Invoice saved: {filename}")
# Generate an invoice
invoice = {
"company_name": "Acme Corp",
"company_address": "123 Business Ave, Tech City, TC 10001",
"invoice_number": "INV-2026-0042",
"date": "2026-03-27",
"due_date": "2026-04-26",
"client_name": "Widget Industries",
"client_address": "456 Client St, Commerce Town, CT 20002",
"tax_rate": 0.08,
"notes": "Thank you for your business! Payment due within 30 days.",
"items": [
{"description": "Web Development - Homepage Redesign", "quantity": 1, "unit_price": 3500.00},
{"description": "UI/UX Consultation (hours)", "quantity": 12, "unit_price": 150.00},
{"description": "Hosting Setup & Configuration", "quantity": 1, "unit_price": 500.00},
{"description": "Monthly Maintenance (prepaid)", "quantity": 3, "unit_price": 200.00},
]
}
generate_invoice(invoice, "invoice_INV-2026-0042.xlsx")
Practical Example: Bulk Data Processing
Process a large data file, clean it, and generate a summary report:
from openpyxl import load_workbook, Workbook
from collections import defaultdict
def process_sales_report(input_file: str, output_file: str):
# Read raw data
wb_in = load_workbook(input_file, read_only=True)
ws_in = wb_in.active
rows = ws_in.iter_rows(values_only=True)
headers = next(rows)
# Process and aggregate
sales_by_region = defaultdict(float)
sales_by_product = defaultdict(float)
monthly_totals = defaultdict(float)
total_records = 0
skipped = 0
for row in rows:
record = dict(zip(headers, row))
# Skip incomplete records
if not record.get("Region") or not record.get("Amount"):
skipped += 1
continue
amount = float(record["Amount"])
sales_by_region[record["Region"]] += amount
sales_by_product[record["Product"]] += amount
if record.get("Date"):
month = str(record["Date"])[:7] # YYYY-MM
monthly_totals[month] += amount
total_records += 1
wb_in.close()
# Write summary report
wb_out = Workbook()
# Regional summary
ws_region = wb_out.active
ws_region.title = "By Region"
ws_region.append(["Region", "Total Sales"])
for region, total in sorted(sales_by_region.items(), key=lambda x: -x[1]):
ws_region.append([region, total])
# Format
for row in ws_region.iter_rows(min_row=2, min_col=2, max_col=2):
for cell in row:
cell.number_format = '$#,##0.00'
# Product summary
ws_product = wb_out.create_sheet("By Product")
ws_product.append(["Product", "Total Sales"])
for product, total in sorted(sales_by_product.items(), key=lambda x: -x[1]):
ws_product.append([product, total])
# Monthly trend
ws_monthly = wb_out.create_sheet("Monthly Trend")
ws_monthly.append(["Month", "Total Sales"])
for month, total in sorted(monthly_totals.items()):
ws_monthly.append([month, total])
# Overview
ws_overview = wb_out.create_sheet("Overview", 0)
ws_overview.append(["Metric", "Value"])
ws_overview.append(["Total Records Processed", total_records])
ws_overview.append(["Records Skipped", skipped])
ws_overview.append(["Total Revenue", sum(sales_by_region.values())])
ws_overview.append(["Number of Regions", len(sales_by_region)])
ws_overview.append(["Number of Products", len(sales_by_product)])
wb_out.save(output_file)
print(f"Report saved: {output_file}")
print(f"Processed {total_records} records, skipped {skipped}")
process_sales_report("raw_sales.xlsx", "sales_summary.xlsx")
Modifying Existing Files
Sometimes you need to update an existing spreadsheet without recreating it:
wb = load_workbook("existing_report.xlsx")
ws = wb["Data"]
# Update a specific cell
ws["E2"] = 999.99
# Add a new column
ws.cell(row=1, column=ws.max_column + 1, value="Status")
for row in range(2, ws.max_row + 1):
total = ws.cell(row=row, column=5).value
if total and total > 1000:
ws.cell(row=row, column=ws.max_column, value="High Value")
else:
ws.cell(row=row, column=ws.max_column, value="Standard")
# Delete a row
ws.delete_rows(5)
# Insert rows
ws.insert_rows(2, amount=3) # Insert 3 rows at position 2
wb.save("existing_report.xlsx")
Freeze Panes and Filters
# Freeze the header row (scroll data while headers stay visible)
ws.freeze_panes = "A2"
# Freeze first column and header row
ws.freeze_panes = "B2"
# Add auto-filter (the dropdown arrows on headers)
ws.auto_filter.ref = f"A1:E{ws.max_row}"
Common Mistakes
Forgetting to save.wb.save() is not automatic. If your script crashes before saving, you lose everything. Save early and often, or use try/finally.
Modifying cells in read_only mode. If you open with read_only=True, you can't write. It's a common source of confusing errors.
Off-by-one errors. openpyxl is 1-indexed (row 1, column 1), not 0-indexed like Python lists. This trips up everyone at least once.
Not closing workbooks. With read_only=True and write_only=True modes, you must call wb.close() to release file handles. Use context managers or try/finally.
Overwriting formulas. If you read a cell that contains a formula, you get the formula string, not the computed value. If you write it back, that's fine. But if you overwrite it with a plain value, the formula is gone. Use data_only=True in load_workbook() to read computed values instead of formulas (note: only works if the file was last saved by Excel).
What's Next
openpyxl covers most Excel automation needs, but there are adjacent tools worth knowing:
- pandas -- For heavy data analysis, read Excel with
pd.read_excel()and write withdf.to_excel(). Uses openpyxl under the hood. - xlsxwriter -- Write-only library with better chart support and performance for large files.
- python-pptx -- Automate PowerPoint, often used alongside Excel automation.
- Schedule / cron -- Run your scripts on a schedule for recurring reports.
- Email integration -- Use
smtplibto email generated reports automatically.
For more Python automation tutorials, check out CodeUp.