March 26, 20269 min read

Logging and Monitoring for Web Apps — A Practical Guide

How to implement logging and monitoring that actually helps you debug production issues. Covers structured logging, log levels, monitoring tools, alerting, and observability patterns.

logging monitoring observability devops debugging

Your app is going to break in production. Not because you're a bad developer -- because distributed systems are inherently unpredictable. A database connection times out. A third-party API starts returning 500s. Memory usage creeps up until the OOM killer strikes. The question isn't whether it'll happen, but whether you'll know about it before your users tell you.

Most developers add logging as an afterthought. They sprinkle console.log("here") during debugging, delete most of it, and ship. Then something breaks at 2 AM and they're left reading through raw stdout trying to reconstruct what happened.

Good logging and monitoring isn't hard. It just requires thinking about it before things go wrong.

Structured Logging — Stop Using console.log

This is useless in production:

Error processing request
Something went wrong
User not found

You can't search it, filter it, or aggregate it. You don't know when it happened, which endpoint was hit, or which user was affected.

Structured logging outputs JSON instead of free-text:

{
  "timestamp": "2026-03-26T14:32:01.234Z",
  "level": "error",
  "message": "User not found",
  "service": "user-api",
  "requestId": "req_abc123",
  "userId": "usr_456",
  "endpoint": "GET /api/users/456",
  "duration": 12,
  "statusCode": 404
}

Now you can filter: "show me all errors from the user-api service in the last hour where duration exceeded 1000ms." That's the difference between debugging for 5 minutes and debugging for 5 hours.

Node.js: Pino

Pino is the fastest structured logger for Node.js:

npm install pino pino-pretty

import pino from "pino";

const logger = pino({
  level: process.env.LOG_LEVEL || "info",
  transport: process.env.NODE_ENV === "development"
    ? { target: "pino-pretty", options: { colorize: true } }
    : undefined,
});

// Basic logging
logger.info("Server started on port 3000");
logger.error({ err: error }, "Failed to connect to database");

// Add context
logger.info({
  userId: user.id,
  action: "login",
  ip: request.ip,
}, "User logged in");

In development, pino-pretty gives you readable colored output. In production, it outputs raw JSON for your log aggregator.

Python: structlog

import structlog

structlog.configure(
    processors=[
        structlog.processors.TimeStamper(fmt="iso"),
        structlog.processors.add_log_level,
        structlog.processors.JSONRenderer(),
    ]
)

logger = structlog.get_logger()

logger.info("user_login", user_id="usr_456", ip="192.168.1.1")
logger.error("payment_failed", order_id="ord_789", amount=49.99, reason="card_declined")

Express Middleware for Request Logging

import { randomUUID } from "crypto";

function requestLogger(req, res, next) {
  const requestId = randomUUID();
  const start = Date.now();

// Attach to request for use in handlers
  req.requestId = requestId;
  req.log = logger.child({ requestId });

res.on("finish", () => {
    const duration = Date.now() - start;
    const logData = {
      method: req.method,
      url: req.originalUrl,
      statusCode: res.statusCode,
      duration,
      userAgent: req.get("user-agent"),
      ip: req.ip,
    };

if (res.statusCode >= 500) {
      req.log.error(logData, "Request failed");
    } else if (res.statusCode >= 400) {
      req.log.warn(logData, "Client error");
    } else {
      req.log.info(logData, "Request completed");
    }
  });

next();
}

app.use(requestLogger);

Log Levels — Use Them Correctly

Level	When to Use	Example
`fatal`	Application cannot continue	Database connection pool exhausted
`error`	Operation failed, needs attention	Payment processing failed
`warn`	Something unexpected but handled	Retry succeeded after timeout
`info`	Normal operations worth recording	User signed up, order placed
`debug`	Detailed diagnostic information	Cache miss for key xyz, query took 45ms
`trace`	Very verbose, step-by-step	Entering function X with args Y

In production, run at info level. Bump to debug when investigating issues. Never run trace in production unless you enjoy paying for log storage.

The most common mistake: logging errors at warn level. If something failed and needs fixing, it's an error. Warnings are for "this worked, but something's off."

// Wrong: this failed, it's an error not a warning
logger.warn("Failed to send email to user");

// Right
logger.error({ userId, emailType, err }, "Failed to send email");

// This is a warning: it worked, but the fallback path is suspicious
logger.warn({ cacheKey, latency: 2300 }, "Cache miss, fell back to database");

What to Log

Always log:

Incoming requests (method, path, status code, duration)
Authentication events (login, logout, failed attempts)
Business-critical operations (orders, payments, signups)
Errors with full context (stack trace, request data, user ID)
External service calls (API requests to third parties, with duration)

Never log:

Passwords, tokens, API keys
Full credit card numbers
Personal health information
Anything covered by GDPR/PII regulations without proper handling

// Scrub sensitive data
function sanitize(body: any) {
  const clean = { ...body };
  if (clean.password) clean.password = "[REDACTED]";
  if (clean.creditCard) clean.creditCard = "**" + clean.creditCard.slice(-4);
  if (clean.ssn) clean.ssn = "[REDACTED]";
  return clean;
}

logger.info({ body: sanitize(req.body) }, "Processing payment");

Log Aggregation

Raw log files on a server are fine for a single-instance hobby project. For anything else, you need a log aggregation service.

Service	Free Tier	Best For
Grafana Loki	Self-hosted, unlimited	Teams already using Grafana
Datadog	1 day retention	Enterprise, full observability
Axiom	500GB/month ingest	Generous free tier, simple
Better Stack (Logtail)	1GB/month	Clean UI, easy setup
AWS CloudWatch	5GB ingest + 5GB storage	AWS-native apps
ELK Stack	Self-hosted, unlimited	Full control, complex setup

For most projects starting out, Axiom or Better Stack are the least painful. You add a transport to your logger and logs flow automatically:

// Axiom with Pino
import pino from "pino";

const logger = pino({
  transport: {
    targets: [
      { target: "pino-pretty", level: "debug" },
      {
        target: "@axiomhq/pino",
        options: {
          dataset: "my-app",
          token: process.env.AXIOM_TOKEN,
        },
      },
    ],
  },
});

Monitoring — Knowing Before Users Complain

Logging tells you _what happened_. Monitoring tells you _what's happening right now_.

The Four Golden Signals

Google's SRE book defines four metrics that matter most:

Latency — How long requests take (track p50, p95, p99)
Traffic — Requests per second
Errors — Error rate as a percentage of total requests
Saturation — How full your resources are (CPU, memory, disk, connections)

If you monitor nothing else, monitor these four.

Health Check Endpoint

Every service needs one:

app.get("/health", async (req, res) => {
  const checks = {
    uptime: process.uptime(),
    timestamp: Date.now(),
    database: "unknown",
    redis: "unknown",
  };

try {
    await db.query("SELECT 1");
    checks.database = "healthy";
  } catch (err) {
    checks.database = "unhealthy";
  }

try {
    await redis.ping();
    checks.redis = "healthy";
  } catch (err) {
    checks.redis = "unhealthy";
  }

const isHealthy = checks.database === "healthy" && checks.redis === "healthy";
  res.status(isHealthy ? 200 : 503).json(checks);
});

Hit this endpoint from an uptime monitor (UptimeRobot, Better Uptime, Pingdom). Get alerted when it returns non-200.

Prometheus Metrics

For custom metrics, Prometheus is the standard:

import { Counter, Histogram, Registry } from "prom-client";

const register = new Registry();

const httpRequestDuration = new Histogram({
  name: "http_request_duration_seconds",
  help: "Duration of HTTP requests in seconds",
  labelNames: ["method", "route", "status_code"],
  buckets: [0.01, 0.05, 0.1, 0.5, 1, 5],
  registers: [register],
});

const httpRequestTotal = new Counter({
  name: "http_requests_total",
  help: "Total number of HTTP requests",
  labelNames: ["method", "route", "status_code"],
  registers: [register],
});

// Middleware
app.use((req, res, next) => {
  const end = httpRequestDuration.startTimer();
  res.on("finish", () => {
    const route = req.route?.path || req.path;
    const labels = { method: req.method, route, status_code: res.statusCode };
    end(labels);
    httpRequestTotal.inc(labels);
  });
  next();
});

// Expose metrics endpoint
app.get("/metrics", async (req, res) => {
  res.set("Content-Type", register.contentType);
  res.end(await register.metrics());
});

Grafana reads from Prometheus and renders dashboards. The combo is free, self-hosted, and used by most of the industry.

Alerting — Signal vs Noise

Bad alerting is worse than no alerting. If your team gets 50 alerts a day and most are false positives, everyone starts ignoring them. Then the real incident gets lost in the noise.

Rules for useful alerts:

Alert on symptoms, not causes. Don't alert when CPU hits 80% -- alert when response time exceeds your SLA. High CPU might be fine during a traffic spike. Slow responses are never fine. Include context in the alert:

🔴 High error rate on user-api
Error rate: 12.3% (threshold: 5%)
Affected endpoint: POST /api/payments
Started: 14:32 UTC
Dashboard: https://grafana.internal/d/user-api

Tier your alerts:

Severity	Response Time	Channel	Example
Critical	Immediate	PagerDuty/phone	Service down, data loss
Warning	Within 1 hour	Slack channel	Error rate elevated
Info	Next business day	Email/dashboard	Disk usage above 70%

Error Tracking

Generic logging catches errors, but a dedicated error tracker gives you stack traces, deduplication, and release tracking:

import * as Sentry from "@sentry/node";

Sentry.init({
  dsn: process.env.SENTRY_DSN,
  environment: process.env.NODE_ENV,
  release: process.env.GIT_SHA,
  tracesSampleRate: 0.1, // 10% of transactions for performance monitoring
});

// Automatic Express error capture
app.use(Sentry.Handlers.errorHandler());

// Manual capture with context
try {
  await processPayment(order);
} catch (err) {
  Sentry.withScope((scope) => {
    scope.setUser({ id: user.id, email: user.email });
    scope.setExtra("orderId", order.id);
    scope.setExtra("amount", order.amount);
    Sentry.captureException(err);
  });
  throw err;
}

Sentry groups identical errors, shows which release introduced a bug, and tracks whether errors are increasing or decreasing. The free tier handles 5K errors/month, which is plenty for most projects.

Distributed Tracing

When a single request touches multiple services, logs from each service are disconnected. Distributed tracing connects them with a trace ID:

Request → API Gateway → User Service → Database
                     → Payment Service → Stripe API
                     → Email Service → SendGrid

OpenTelemetry is the standard:

import { NodeSDK } from "@opentelemetry/sdk-node";
import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http";

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({
    url: "https://otel-collector.internal/v1/traces",
  }),
  serviceName: "user-api",
});

sdk.start();

With tracing, you can see that a slow checkout request spent 200ms in the user service, 50ms in inventory, and 3000ms waiting for the payment provider. Without it, you'd be guessing which service was the bottleneck.

A Practical Stack for Most Projects

For a team of 1-10 developers:

Structured logging with Pino (Node.js) or structlog (Python)
Error tracking with Sentry (free tier)
Uptime monitoring with UptimeRobot or Better Uptime (free tier)
Log aggregation with Axiom or Better Stack (free tier)
Metrics with Prometheus + Grafana if you need dashboards

Skip distributed tracing until you actually have multiple services. Skip Datadog until you have the budget. Start simple, add complexity when you have specific problems to solve.

The goal is always the same: when something breaks at 2 AM, you should be able to open your dashboard, see exactly what went wrong, and fix it before most users even notice. At CodeUp, we've found that even basic structured logging cuts incident resolution time by more than half compared to grepping through raw log files.