What is an epoch in machine learning?

Overview

Overview
Why do we need multiple epochs?
Epoch vs. batch vs. step
How many epochs should I use?
Practical tips
Conclusion

An epoch in machine learning is one complete pass over the entire training dataset. In each epoch, the model sees every training example at least once, computes the loss, and updates its internal parameters to improve its predictions.

Before we get into the details, try the interactive below to see how epochs, batches, and steps relate.

Animation: An interactive grid of samples with sliders for epochs, batch size, and learning rate; highlights the current batch each step and shows overall progress.

Why do we need multiple epochs?

Training is iterative. One pass over the data rarely gives a model enough signal to generalize well. By repeating the process for multiple epochs, the model gradually reduces its loss and improves accuracy. However, training for too many epochs can cause overfitting—when the model memorizes the training set and fails to generalize.

Epoch vs. batch vs. step

Epoch: One full pass over the entire dataset.
Batch: A small subset of the dataset processed together in one update.
Step (iteration): One parameter update using a single batch. If you have N samples and a batch size of B, then steps-per-epoch ≈ ceil(N/B).

Analogy: Imagine studying for an exam by reviewing your notes. Each time you go through all your notes once, that's an epoch. You might study in chunks—say 10 pages at a time—that's your batch size. Each chunk you review is a step. Multiple full reviews (epochs) typically lead to better retention, but reviewing endlessly without new problems can lead to memorization, not understanding—overfitting.

How many epochs should I use?

There isn't a one-size-fits-all number. Common strategies include:

Early stopping: Monitor validation loss; stop when it stops improving.
Learning rate schedules: Reduce the learning rate as training progresses.
Cross-validation: Evaluate across folds to pick a stable epoch count.

For small datasets, you might need more epochs; for large datasets or powerful models, fewer may suffice. Always validate on unseen data.

Practical tips

Start with a reasonable default (e.g., 10–50 epochs) and use early stopping.
Keep an eye on the gap between training and validation metrics—widening gaps suggest overfitting.
Combine with batch size and learning rate tuning; these three hyperparameters are tightly coupled.

Conclusion

An epoch is a simple but foundational concept: a single full pass through your training data. Understanding how it interacts with batch size, steps, and learning rate helps you design more efficient—and better generalizing—training runs.

Available for hire - If you're looking for a skilled full-stack developer with AI integration experience, feel free to reach out at hire@codewarnab.in

Previous Blog← What is learning rate in machine learning?

Next BlogWhy Decoder-Only Models Rule Modern AI: Architecture Insights from GPT to Llama →