March 26, 20269 min read

Logging and Monitoring for Web Apps — A Practical Guide

How to implement logging and monitoring that actually helps you debug production issues. Covers structured logging, log levels, monitoring tools, alerting, and observability patterns.

logging monitoring observability devops debugging
Ad 336x280

Your app is going to break in production. Not because you're a bad developer -- because distributed systems are inherently unpredictable. A database connection times out. A third-party API starts returning 500s. Memory usage creeps up until the OOM killer strikes. The question isn't whether it'll happen, but whether you'll know about it before your users tell you.

Most developers add logging as an afterthought. They sprinkle console.log("here") during debugging, delete most of it, and ship. Then something breaks at 2 AM and they're left reading through raw stdout trying to reconstruct what happened.

Good logging and monitoring isn't hard. It just requires thinking about it before things go wrong.

Structured Logging — Stop Using console.log

This is useless in production:

Error processing request
Something went wrong
User not found

You can't search it, filter it, or aggregate it. You don't know when it happened, which endpoint was hit, or which user was affected.

Structured logging outputs JSON instead of free-text:

{
  "timestamp": "2026-03-26T14:32:01.234Z",
  "level": "error",
  "message": "User not found",
  "service": "user-api",
  "requestId": "req_abc123",
  "userId": "usr_456",
  "endpoint": "GET /api/users/456",
  "duration": 12,
  "statusCode": 404
}

Now you can filter: "show me all errors from the user-api service in the last hour where duration exceeded 1000ms." That's the difference between debugging for 5 minutes and debugging for 5 hours.

Node.js: Pino

Pino is the fastest structured logger for Node.js:

npm install pino pino-pretty
import pino from "pino";

const logger = pino({
level: process.env.LOG_LEVEL || "info",
transport: process.env.NODE_ENV === "development"
? { target: "pino-pretty", options: { colorize: true } }
: undefined,
});

// Basic logging
logger.info("Server started on port 3000");
logger.error({ err: error }, "Failed to connect to database");

// Add context
logger.info({
userId: user.id,
action: "login",
ip: request.ip,
}, "User logged in");

In development, pino-pretty gives you readable colored output. In production, it outputs raw JSON for your log aggregator.

Python: structlog

import structlog

structlog.configure(
processors=[
structlog.processors.TimeStamper(fmt="iso"),
structlog.processors.add_log_level,
structlog.processors.JSONRenderer(),
]
)

logger = structlog.get_logger()

logger.info("user_login", user_id="usr_456", ip="192.168.1.1")
logger.error("payment_failed", order_id="ord_789", amount=49.99, reason="card_declined")

Express Middleware for Request Logging

import { randomUUID } from "crypto";

function requestLogger(req, res, next) {
const requestId = randomUUID();
const start = Date.now();

// Attach to request for use in handlers
req.requestId = requestId;
req.log = logger.child({ requestId });

res.on("finish", () => {
const duration = Date.now() - start;
const logData = {
method: req.method,
url: req.originalUrl,
statusCode: res.statusCode,
duration,
userAgent: req.get("user-agent"),
ip: req.ip,
};

if (res.statusCode >= 500) {
req.log.error(logData, "Request failed");
} else if (res.statusCode >= 400) {
req.log.warn(logData, "Client error");
} else {
req.log.info(logData, "Request completed");
}
});

next();
}

app.use(requestLogger);

Log Levels — Use Them Correctly

LevelWhen to UseExample
fatalApplication cannot continueDatabase connection pool exhausted
errorOperation failed, needs attentionPayment processing failed
warnSomething unexpected but handledRetry succeeded after timeout
infoNormal operations worth recordingUser signed up, order placed
debugDetailed diagnostic informationCache miss for key xyz, query took 45ms
traceVery verbose, step-by-stepEntering function X with args Y
In production, run at info level. Bump to debug when investigating issues. Never run trace in production unless you enjoy paying for log storage.

The most common mistake: logging errors at warn level. If something failed and needs fixing, it's an error. Warnings are for "this worked, but something's off."

// Wrong: this failed, it's an error not a warning
logger.warn("Failed to send email to user");

// Right
logger.error({ userId, emailType, err }, "Failed to send email");

// This is a warning: it worked, but the fallback path is suspicious
logger.warn({ cacheKey, latency: 2300 }, "Cache miss, fell back to database");

What to Log

Always log:
  • Incoming requests (method, path, status code, duration)
  • Authentication events (login, logout, failed attempts)
  • Business-critical operations (orders, payments, signups)
  • Errors with full context (stack trace, request data, user ID)
  • External service calls (API requests to third parties, with duration)
Never log:
  • Passwords, tokens, API keys
  • Full credit card numbers
  • Personal health information
  • Anything covered by GDPR/PII regulations without proper handling
// Scrub sensitive data
function sanitize(body: any) {
  const clean = { ...body };
  if (clean.password) clean.password = "[REDACTED]";
  if (clean.creditCard) clean.creditCard = "**" + clean.creditCard.slice(-4);
  if (clean.ssn) clean.ssn = "[REDACTED]";
  return clean;
}

logger.info({ body: sanitize(req.body) }, "Processing payment");

Log Aggregation

Raw log files on a server are fine for a single-instance hobby project. For anything else, you need a log aggregation service.

ServiceFree TierBest For
Grafana LokiSelf-hosted, unlimitedTeams already using Grafana
Datadog1 day retentionEnterprise, full observability
Axiom500GB/month ingestGenerous free tier, simple
Better Stack (Logtail)1GB/monthClean UI, easy setup
AWS CloudWatch5GB ingest + 5GB storageAWS-native apps
ELK StackSelf-hosted, unlimitedFull control, complex setup
For most projects starting out, Axiom or Better Stack are the least painful. You add a transport to your logger and logs flow automatically:
// Axiom with Pino
import pino from "pino";

const logger = pino({
transport: {
targets: [
{ target: "pino-pretty", level: "debug" },
{
target: "@axiomhq/pino",
options: {
dataset: "my-app",
token: process.env.AXIOM_TOKEN,
},
},
],
},
});

Monitoring — Knowing Before Users Complain

Logging tells you _what happened_. Monitoring tells you _what's happening right now_.

The Four Golden Signals

Google's SRE book defines four metrics that matter most:

  1. Latency — How long requests take (track p50, p95, p99)
  2. Traffic — Requests per second
  3. Errors — Error rate as a percentage of total requests
  4. Saturation — How full your resources are (CPU, memory, disk, connections)
If you monitor nothing else, monitor these four.

Health Check Endpoint

Every service needs one:

app.get("/health", async (req, res) => {
  const checks = {
    uptime: process.uptime(),
    timestamp: Date.now(),
    database: "unknown",
    redis: "unknown",
  };

try {
await db.query("SELECT 1");
checks.database = "healthy";
} catch (err) {
checks.database = "unhealthy";
}

try {
await redis.ping();
checks.redis = "healthy";
} catch (err) {
checks.redis = "unhealthy";
}

const isHealthy = checks.database === "healthy" && checks.redis === "healthy";
res.status(isHealthy ? 200 : 503).json(checks);
});

Hit this endpoint from an uptime monitor (UptimeRobot, Better Uptime, Pingdom). Get alerted when it returns non-200.

Prometheus Metrics

For custom metrics, Prometheus is the standard:

import { Counter, Histogram, Registry } from "prom-client";

const register = new Registry();

const httpRequestDuration = new Histogram({
name: "http_request_duration_seconds",
help: "Duration of HTTP requests in seconds",
labelNames: ["method", "route", "status_code"],
buckets: [0.01, 0.05, 0.1, 0.5, 1, 5],
registers: [register],
});

const httpRequestTotal = new Counter({
name: "http_requests_total",
help: "Total number of HTTP requests",
labelNames: ["method", "route", "status_code"],
registers: [register],
});

// Middleware
app.use((req, res, next) => {
const end = httpRequestDuration.startTimer();
res.on("finish", () => {
const route = req.route?.path || req.path;
const labels = { method: req.method, route, status_code: res.statusCode };
end(labels);
httpRequestTotal.inc(labels);
});
next();
});

// Expose metrics endpoint
app.get("/metrics", async (req, res) => {
res.set("Content-Type", register.contentType);
res.end(await register.metrics());
});

Grafana reads from Prometheus and renders dashboards. The combo is free, self-hosted, and used by most of the industry.

Alerting — Signal vs Noise

Bad alerting is worse than no alerting. If your team gets 50 alerts a day and most are false positives, everyone starts ignoring them. Then the real incident gets lost in the noise.

Rules for useful alerts:

Alert on symptoms, not causes. Don't alert when CPU hits 80% -- alert when response time exceeds your SLA. High CPU might be fine during a traffic spike. Slow responses are never fine. Include context in the alert:
🔴 High error rate on user-api
Error rate: 12.3% (threshold: 5%)
Affected endpoint: POST /api/payments
Started: 14:32 UTC
Dashboard: https://grafana.internal/d/user-api
Tier your alerts:
SeverityResponse TimeChannelExample
CriticalImmediatePagerDuty/phoneService down, data loss
WarningWithin 1 hourSlack channelError rate elevated
InfoNext business dayEmail/dashboardDisk usage above 70%

Error Tracking

Generic logging catches errors, but a dedicated error tracker gives you stack traces, deduplication, and release tracking:

import * as Sentry from "@sentry/node";

Sentry.init({
dsn: process.env.SENTRY_DSN,
environment: process.env.NODE_ENV,
release: process.env.GIT_SHA,
tracesSampleRate: 0.1, // 10% of transactions for performance monitoring
});

// Automatic Express error capture
app.use(Sentry.Handlers.errorHandler());

// Manual capture with context
try {
await processPayment(order);
} catch (err) {
Sentry.withScope((scope) => {
scope.setUser({ id: user.id, email: user.email });
scope.setExtra("orderId", order.id);
scope.setExtra("amount", order.amount);
Sentry.captureException(err);
});
throw err;
}

Sentry groups identical errors, shows which release introduced a bug, and tracks whether errors are increasing or decreasing. The free tier handles 5K errors/month, which is plenty for most projects.

Distributed Tracing

When a single request touches multiple services, logs from each service are disconnected. Distributed tracing connects them with a trace ID:

Request → API Gateway → User Service → Database
                     → Payment Service → Stripe API
                     → Email Service → SendGrid

OpenTelemetry is the standard:

import { NodeSDK } from "@opentelemetry/sdk-node";
import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http";

const sdk = new NodeSDK({
traceExporter: new OTLPTraceExporter({
url: "https://otel-collector.internal/v1/traces",
}),
serviceName: "user-api",
});

sdk.start();

With tracing, you can see that a slow checkout request spent 200ms in the user service, 50ms in inventory, and 3000ms waiting for the payment provider. Without it, you'd be guessing which service was the bottleneck.

A Practical Stack for Most Projects

For a team of 1-10 developers:

  1. Structured logging with Pino (Node.js) or structlog (Python)
  2. Error tracking with Sentry (free tier)
  3. Uptime monitoring with UptimeRobot or Better Uptime (free tier)
  4. Log aggregation with Axiom or Better Stack (free tier)
  5. Metrics with Prometheus + Grafana if you need dashboards
Skip distributed tracing until you actually have multiple services. Skip Datadog until you have the budget. Start simple, add complexity when you have specific problems to solve.

The goal is always the same: when something breaks at 2 AM, you should be able to open your dashboard, see exactly what went wrong, and fix it before most users even notice. At CodeUp, we've found that even basic structured logging cuts incident resolution time by more than half compared to grepping through raw log files.

Ad 728x90