Model Training Time Calculator
Estimate how long your ML model will take to train based on dataset size, batch size, GPU specs, and epochs. Plan your compute budget wisely.
Nothing quite humbles you like kicking off a training run, going to bed, and waking up to find it's 12% done with 47 hours remaining. Estimating training time before committing to a long run can save real money on cloud GPU rentals and help you schedule compute more intelligently.
The Basic Math Behind Training Time
Training time comes down to a few factors multiplied together:
Time = (Dataset size / Batch size) × Steps per epoch × Epochs × Time per step
Time per step depends on your hardware throughput, model size, and whether you're using mixed precision. The CalcHub Training Time Calculator handles all of this — you plug in your setup and it gives you an estimate in minutes or hours, broken down per epoch.
What You'll Need to Input
- Dataset size — total number of samples
- Batch size — samples processed per gradient step
- Number of epochs
- GPU model — the calculator has throughput benchmarks for A100, V100, T4, RTX 3090, RTX 4090, and others
- Model FLOPs — if you know them from the FLOPs Calculator, or you can use model presets
- Mixed precision — fp16/bf16 typically gives a 1.5–2× speedup over fp32
Training Time Estimates for Common Setups
| Task | Dataset | Model | Hardware | Est. Time |
|---|---|---|---|---|
| MNIST classification | 60,000 images | Small CNN (500K params) | RTX 3090 | ~2 min / epoch |
| ImageNet fine-tune | 1.2M images | ResNet-50 | A100 40GB | ~25 min / epoch |
| Text classification | 100K samples | BERT-base fine-tune | T4 | ~18 min / epoch |
| GPT-2 Small pre-train | 1B tokens | 117M params | 8× A100 | ~2 days |
| Custom transformer | 500K samples | 10M params, fp16 | RTX 4090 | ~8 min / epoch |
Real-World Scenario: Fine-Tuning a BERT Classifier
Say you have 80,000 labeled support tickets and you want to fine-tune BERT-base for 5 epochs with batch size 32 on a single T4 GPU.
- Steps per epoch: 80,000 / 32 = 2,500
- Time per step on T4 (BERT-base, fp16): roughly 90ms
- Time per epoch: 2,500 × 0.09s = 225 seconds (~3.75 min)
- Total 5 epochs: ~19 minutes
Tips for Cutting Training Time
Use mixed precision. Switching from fp32 to bf16 on an A100 is nearly free and delivers up to 2× speedup with no quality loss on most tasks. Profile your data loader. IfDataLoader threads are the bottleneck, num_workers=4+ and pin_memory=True can halve effective step time. The GPU is idling while waiting for batches.
Gradient accumulation as a workaround. If you can't increase physical batch size due to VRAM limits, accumulate gradients over N steps to simulate a larger batch — but this doesn't help training time, it just changes effective batch size.
Early stopping. Train to convergence, not to a fixed epoch count. Checkpoint validation loss every N steps and stop when it plateaus.
Warmup your estimates. Run 50 steps, note the per-step time, then multiply out. More reliable than theoretical benchmarks.
How accurate are training time estimates?
Within 20–30% in most cases. The biggest sources of variance are data loading speed, CPU preprocessing overhead, and whether your batch size fits cleanly into GPU memory without spills.
Why does training slow down after the first few epochs sometimes?
Often it's learning rate warmup completing and the optimizer switching behavior, or gradient checkpointing kicking in. It can also be cache effects — if your dataset doesn't fit in RAM, disk reads slow later epochs.
Does multi-GPU training cut time proportionally?
Mostly, but not perfectly. With 4 GPUs you might get 3.2–3.6× speedup rather than 4× due to communication overhead (gradient all-reduce). DDP is more efficient than model parallelism for this reason.
Related Calculators
- GPU Memory Calculator — check if your model fits before training
- Batch Size Calculator — find the optimal batch for your hardware
- FLOPs Calculator — compute cost per forward/backward pass