March 27, 202610 min read

Matplotlib and Data Visualization in Python: Make Charts That Actually Communicate

Master Matplotlib for data visualization. Line, bar, scatter, histogram charts, customization, subplots, Seaborn integration, and storytelling with data.

matplotlib python data-visualization data-science tutorial

A chart is supposed to make data easier to understand. Most charts fail at this. They use default colors that blend together, missing labels that force the reader to guess, and chart types that don't match the data. Matplotlib gives you total control over every pixel of your visualization, which means you can make excellent charts or terrible ones.

This tutorial teaches you both how Matplotlib works and how to make charts that actually communicate something to the person looking at them.

Setup and Basics

pip install matplotlib numpy pandas seaborn

The standard import convention:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

pyplot is Matplotlib's state-based interface. It's what you'll use 90% of the time.

Your First Plot

x = [1, 2, 3, 4, 5]
y = [2, 4, 7, 11, 16]

plt.plot(x, y)
plt.show()

This creates a line chart with default styling. It works, but it tells you nothing about what the data means. Let's fix that.

x = [1, 2, 3, 4, 5]
y = [2, 4, 7, 11, 16]

plt.figure(figsize=(8, 5))
plt.plot(x, y, color='#2563eb', linewidth=2, marker='o', markersize=6)
plt.title('Monthly Revenue Growth', fontsize=16, fontweight='bold', pad=15)
plt.xlabel('Month', fontsize=12)
plt.ylabel('Revenue ($K)', fontsize=12)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Every chart should have a title, axis labels, and enough context for someone to understand it without explanation.

The Two Interfaces

Matplotlib has two ways to create plots:

pyplot interface (stateful): Convenient for quick plots. You call plt.plot(), plt.title(), etc. Matplotlib keeps track of the "current figure" and "current axes."

plt.figure(figsize=(8, 5))
plt.plot(x, y)
plt.title('Sales')
plt.show()

Object-oriented interface (explicit): Better for complex figures. You create figure and axes objects and call methods on them directly.

fig, ax = plt.subplots(figsize=(8, 5))
ax.plot(x, y)
ax.set_title('Sales')
plt.show()

The object-oriented interface is more verbose but less ambiguous. When you have multiple subplots, it's the only sane option. This tutorial uses both, but lean toward the OO interface for anything beyond a quick one-off plot.

Line Charts

Line charts show trends over time. They're the right choice when you have continuous data with a natural order.

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
          'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
revenue_2024 = [12, 14, 15, 18, 22, 25, 24, 28, 30, 33, 35, 40]
revenue_2025 = [15, 17, 20, 24, 28, 32, 31, 35, 38, 42, 45, 50]

fig, ax = plt.subplots(figsize=(10, 6))

ax.plot(months, revenue_2024, color='#94a3b8', linewidth=2,
        marker='o', markersize=5, label='2024')
ax.plot(months, revenue_2025, color='#2563eb', linewidth=2.5,
        marker='o', markersize=5, label='2025')

ax.set_title('Monthly Revenue: 2024 vs 2025', fontsize=16, fontweight='bold', pad=15)
ax.set_xlabel('Month', fontsize=12)
ax.set_ylabel('Revenue ($K)', fontsize=12)
ax.legend(fontsize=11)
ax.grid(True, axis='y', alpha=0.3)
ax.set_ylim(0, 55)

# Highlight the current year with a filled area
ax.fill_between(months, revenue_2025, alpha=0.1, color='#2563eb')

plt.tight_layout()
plt.show()

Design principle: make the current/important data visually dominant (bold color, thicker line) and the comparison data subdued (lighter color, thinner line).

Bar Charts

Bar charts compare discrete categories. Use vertical bars when categories are nominal, horizontal bars when category names are long.

languages = ['Python', 'JavaScript', 'TypeScript', 'Java', 'C#', 'Go', 'Rust']
satisfaction = [85, 72, 89, 58, 65, 82, 91]
colors = ['#2563eb' if s >= 80 else '#94a3b8' for s in satisfaction]

fig, ax = plt.subplots(figsize=(10, 6))

bars = ax.barh(languages, satisfaction, color=colors, height=0.6, edgecolor='white')

# Add value labels
for bar, value in zip(bars, satisfaction):
    ax.text(bar.get_width() + 1, bar.get_y() + bar.get_height() / 2,
            f'{value}%', va='center', fontsize=11, fontweight='bold')

ax.set_title('Developer Satisfaction by Language (2025)',
             fontsize=16, fontweight='bold', pad=15)
ax.set_xlabel('Satisfaction (%)', fontsize=12)
ax.set_xlim(0, 105)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

plt.tight_layout()
plt.show()

Design principle: color-code to highlight a threshold. Here, languages with 80%+ satisfaction are blue, the rest are gray. The reader's eye immediately goes to the meaningful distinction.

Grouped Bar Charts

categories = ['Q1', 'Q2', 'Q3', 'Q4']
product_a = [45, 52, 48, 61]
product_b = [38, 44, 55, 50]

x = np.arange(len(categories))
width = 0.35

fig, ax = plt.subplots(figsize=(9, 6))

ax.bar(x - width/2, product_a, width, label='Product A', color='#2563eb')
ax.bar(x + width/2, product_b, width, label='Product B', color='#f97316')

ax.set_title('Quarterly Sales by Product', fontsize=16, fontweight='bold', pad=15)
ax.set_ylabel('Units Sold (K)', fontsize=12)
ax.set_xticks(x)
ax.set_xticklabels(categories)
ax.legend()
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

plt.tight_layout()
plt.show()

Scatter Plots

Scatter plots show relationships between two continuous variables.

np.random.seed(42)
study_hours = np.random.uniform(1, 12, 100)
test_scores = 40 + 4.5 * study_hours + np.random.normal(0, 8, 100)
test_scores = np.clip(test_scores, 0, 100)

fig, ax = plt.subplots(figsize=(9, 6))

scatter = ax.scatter(study_hours, test_scores, c=test_scores, cmap='RdYlGn',
                     s=50, alpha=0.7, edgecolors='white', linewidth=0.5)

# Add trend line
z = np.polyfit(study_hours, test_scores, 1)
p = np.poly1d(z)
x_line = np.linspace(1, 12, 100)
ax.plot(x_line, p(x_line), color='#1e293b', linewidth=2,
        linestyle='--', alpha=0.7, label=f'Trend (r={np.corrcoef(study_hours, test_scores)[0,1]:.2f})')

ax.set_title('Study Hours vs Test Scores', fontsize=16, fontweight='bold', pad=15)
ax.set_xlabel('Hours Studied per Week', fontsize=12)
ax.set_ylabel('Test Score (%)', fontsize=12)
ax.legend(fontsize=11)

plt.colorbar(scatter, ax=ax, label='Score', shrink=0.8)
plt.tight_layout()
plt.show()

The color gradient makes it easy to spot clusters. The trend line quantifies the relationship. The correlation coefficient tells you how strong it is.

Histograms

Histograms show the distribution of a single variable. They answer "how is this data spread out?"

np.random.seed(42)
response_times = np.concatenate([
    np.random.normal(200, 50, 800),   # Normal requests
    np.random.normal(500, 100, 150),  # Slow requests
    np.random.normal(1200, 200, 50),  # Very slow requests
])
response_times = response_times[response_times > 0]

fig, ax = plt.subplots(figsize=(10, 6))

counts, bins, patches = ax.hist(response_times, bins=50, color='#2563eb',
                                 alpha=0.7, edgecolor='white')

# Color bins above the SLA threshold
sla_threshold = 500
for patch, left_edge in zip(patches, bins[:-1]):
    if left_edge >= sla_threshold:
        patch.set_facecolor('#dc2626')

ax.axvline(x=sla_threshold, color='#dc2626', linestyle='--',
           linewidth=2, label=f'SLA threshold ({sla_threshold}ms)')

# Add statistics
median = np.median(response_times)
p95 = np.percentile(response_times, 95)
ax.axvline(x=median, color='#16a34a', linestyle='--', linewidth=1.5, label=f'Median ({median:.0f}ms)')
ax.axvline(x=p95, color='#f97316', linestyle='--', linewidth=1.5, label=f'P95 ({p95:.0f}ms)')

ax.set_title('API Response Time Distribution', fontsize=16, fontweight='bold', pad=15)
ax.set_xlabel('Response Time (ms)', fontsize=12)
ax.set_ylabel('Request Count', fontsize=12)
ax.legend(fontsize=10)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

plt.tight_layout()
plt.show()

This tells a story: most requests are fast (blue), but a significant tail exceeds the SLA (red). The median, P95, and threshold lines give context without forcing the reader to do math.

Subplots: Multiple Charts Together

When related charts belong together, use subplots:

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Top-left: Line chart
months = range(1, 13)
axes[0, 0].plot(months, revenue_2025, color='#2563eb', linewidth=2, marker='o')
axes[0, 0].set_title('Revenue Trend', fontweight='bold')
axes[0, 0].set_ylabel('Revenue ($K)')

# Top-right: Bar chart
top_products = ['Widget', 'Gadget', 'Doohickey', 'Thingamajig']
sales = [120, 95, 78, 62]
axes[0, 1].bar(top_products, sales, color=['#2563eb', '#3b82f6', '#60a5fa', '#93c5fd'])
axes[0, 1].set_title('Sales by Product', fontweight='bold')
axes[0, 1].set_ylabel('Units')

# Bottom-left: Scatter
axes[1, 0].scatter(study_hours, test_scores, alpha=0.5, color='#2563eb', s=30)
axes[1, 0].set_title('Hours vs Scores', fontweight='bold')
axes[1, 0].set_xlabel('Study Hours')
axes[1, 0].set_ylabel('Score')

# Bottom-right: Histogram
axes[1, 1].hist(response_times, bins=40, color='#2563eb', alpha=0.7, edgecolor='white')
axes[1, 1].set_title('Response Times', fontweight='bold')
axes[1, 1].set_xlabel('Time (ms)')
axes[1, 1].set_ylabel('Count')

fig.suptitle('Q4 Dashboard', fontsize=18, fontweight='bold', y=1.01)
plt.tight_layout()
plt.show()

plt.subplots(2, 2) creates a 2x2 grid. Each axes[row, col] is an independent chart. tight_layout() prevents overlapping labels.

For unequal layouts, use GridSpec:

from matplotlib.gridspec import GridSpec

fig = plt.figure(figsize=(14, 8))
gs = GridSpec(2, 3, figure=fig)

ax_main = fig.add_subplot(gs[0, :])     # Top row, full width
ax_left = fig.add_subplot(gs[1, 0])     # Bottom-left
ax_mid = fig.add_subplot(gs[1, 1])      # Bottom-middle
ax_right = fig.add_subplot(gs[1, 2])    # Bottom-right

Customization and Styling

Removing Chart Junk

Edward Tufte's principle: maximize the data-ink ratio. Remove anything that doesn't convey information.

fig, ax = plt.subplots(figsize=(10, 6))

ax.plot(months, revenue_2025, color='#2563eb', linewidth=2.5)

# Remove unnecessary elements
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_color('#e2e8f0')
ax.spines['bottom'].set_color('#e2e8f0')
ax.tick_params(colors='#64748b')
ax.grid(True, axis='y', alpha=0.2, color='#e2e8f0')

ax.set_title('Revenue is Accelerating', fontsize=16, fontweight='bold',
             color='#1e293b', pad=15)

Custom Color Palettes

Stop using Matplotlib's defaults. Here are palettes that work well together:

# Professional blue palette
blues = ['#1e3a5f', '#2563eb', '#3b82f6', '#60a5fa', '#93c5fd']

# Categorical palette (colorblind-friendly)
categorical = ['#2563eb', '#f97316', '#16a34a', '#dc2626', '#8b5cf6', '#ec4899']

# Sequential palette for heatmaps
from matplotlib.colors import LinearSegmentedColormap
custom_cmap = LinearSegmentedColormap.from_list('custom', ['#dbeafe', '#2563eb', '#1e3a5f'])

Annotations

Point the reader to what matters:

fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(months, revenue_2025, color='#2563eb', linewidth=2.5, marker='o')

# Annotate a specific point
ax.annotate('Product launch',
            xy=(5, 28),                    # Point to annotate
            xytext=(7, 20),                # Text position
            fontsize=11,
            arrowprops=dict(
                arrowstyle='->',
                color='#64748b',
                connectionstyle='arc3,rad=0.3'
            ),
            color='#64748b')

ax.annotate('Record month',
            xy=(12, 50),
            fontsize=11, fontweight='bold',
            color='#16a34a',
            ha='right')

Seaborn Integration

Seaborn is built on top of Matplotlib with better defaults and statistical chart types:

import seaborn as sns

# Set a clean theme
sns.set_theme(style='whitegrid', palette='muted', font_scale=1.1)

# Distribution plot
fig, ax = plt.subplots(figsize=(10, 6))
sns.histplot(response_times, kde=True, bins=50, color='#2563eb', ax=ax)
ax.set_title('Response Time Distribution', fontweight='bold')

Seaborn shines with DataFrames:

# Sample data
df = pd.DataFrame({
    'month': ['Jan', 'Feb', 'Mar', 'Apr'] * 3,
    'channel': ['Web']  4 + ['Mobile']  4 + ['API'] * 4,
    'users': [1200, 1350, 1500, 1800, 800, 950, 1100, 1400, 400, 450, 520, 600],
})

fig, ax = plt.subplots(figsize=(10, 6))
sns.barplot(data=df, x='month', y='users', hue='channel', ax=ax)
ax.set_title('Users by Channel', fontweight='bold')

Box Plots and Violin Plots

# Compare distributions across categories
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

sns.boxplot(data=df, x='channel', y='users', ax=axes[0], palette='Blues')
axes[0].set_title('Box Plot: Users by Channel', fontweight='bold')

sns.violinplot(data=df, x='channel', y='users', ax=axes[1], palette='Blues')
axes[1].set_title('Violin Plot: Users by Channel', fontweight='bold')

plt.tight_layout()
plt.show()

Box plots show quartiles and outliers. Violin plots show the full distribution shape. Use violin plots when the shape of the distribution matters.

Heatmaps

# Correlation matrix
np.random.seed(42)
data = pd.DataFrame({
    'Revenue': np.random.normal(100, 20, 50),
    'Users': np.random.normal(5000, 1000, 50),
    'Sessions': np.random.normal(8000, 2000, 50),
    'Bounce Rate': np.random.normal(40, 10, 50),
    'Conversion': np.random.normal(3, 1, 50),
})

fig, ax = plt.subplots(figsize=(8, 6))
sns.heatmap(data.corr(), annot=True, fmt='.2f', cmap='RdBu_r',
            center=0, square=True, ax=ax, linewidths=1,
            cbar_kws={'shrink': 0.8})
ax.set_title('Metric Correlations', fontweight='bold', pad=15)
plt.tight_layout()
plt.show()

Saving Figures

# Save as PNG (for web)
fig.savefig('chart.png', dpi=150, bbox_inches='tight', facecolor='white')

# Save as SVG (for presentations, infinitely scalable)
fig.savefig('chart.svg', bbox_inches='tight', facecolor='white')

# Save as PDF (for print)
fig.savefig('chart.pdf', bbox_inches='tight', facecolor='white')

bbox_inches='tight' removes extra whitespace. dpi=150 is good for web; use dpi=300 for print.

Choosing the Right Chart

This is where most people go wrong. The chart type should match the question you're answering:

"How does this change over time?" -- Line chart. Use it for trends, time series, continuous data. "How do these categories compare?" -- Bar chart. Vertical for few categories, horizontal for many or long names. "Is there a relationship between X and Y?" -- Scatter plot. Add a trend line to quantify the relationship. "How is this data distributed?" -- Histogram or box plot. Histogram for one variable, box plot for comparing distributions across categories. "What's the composition?" -- Stacked bar chart or (carefully) a pie chart. Pie charts only work with a few slices. Stacked bars scale better. "What's the correlation between variables?" -- Heatmap of a correlation matrix.

Common Mistakes

Using pie charts for more than 5 categories. Humans are bad at comparing angles. Beyond 5 slices, use a bar chart. Rainbow color schemes. They look busy and are unreadable for colorblind users. Use a sequential palette for ordered data, categorical palette for distinct groups. 3D charts. Almost never improve understanding. They add perspective distortion that makes values harder to compare. Stick to 2D. Missing axis labels. A chart without labels is a puzzle, not a visualization. Always label axes with units. Truncated y-axis. Starting the y-axis at 90 instead of 0 can make a 2% change look like a 50% change. Be honest with your scales, or clearly mark that the axis is truncated. Too much data in one chart. If you need a legend with 12 entries, split it into multiple charts. Each chart should communicate one clear idea.

What's Next

You now know how to create the most common chart types, customize them for clarity, and use both Matplotlib and Seaborn. The next steps are learning interactive visualization with Plotly, building dashboards with Streamlit or Dash, working with geospatial data using Folium, and animating charts for presentations.

For data visualization projects and hands-on practice with real datasets, check out CodeUp.