Skip to content

Optimizer

Optimization algorithms for neural networks.

This module implements various optimization algorithms commonly used in deep learning, including Stochastic Gradient Descent (SGD) and Adam. These optimizers are used to update model parameters during training to minimize the loss function.

The module provides a base Optimizer class that defines the common interface for all optimizers, as well as concrete implementations of specific optimization algorithms.

Classes:

  • Optimizer

    Base class that provides common optimizer functionality.

  • SGD

    Stochastic Gradient Descent optimizer with optional momentum.

  • Adam

    Adaptive Moment Estimation optimizer.

See Also

tensor.Tensor : The Tensor class used for parameter optimization.

nn : Neural network modules whose parameters are optimized.

Adam

Bases: Optimizer

Adaptive Moment Estimation optimizer.

Adam is an optimization algorithm that combines the benefits of two other extensions of stochastic gradient descent: - Adaptive Gradient Algorithm (AdaGrad) - Root Mean Square Propagation (RMSProp)

It computes individual adaptive learning rates for different parameters from estimates of first and second moments of the gradients.

The update rule for parameter p is:

m = β₁ * m + (1 - β₁) * g         # First moment estimate

v = β₂ * v + (1 - β₂) * g²        # Second moment estimate

m̂ = m / (1 - β₁ᵗ)                 # Bias correction

v̂ = v / (1 - β₂ᵗ)                 # Bias correction

p = p - lr * m̂ / (√v̂ + ε)        # Update
Notes

The optimizer implements the Adam algorithm as described in "Adam: A Method for Stochastic Optimization" by Kingma and Ba (2014).

See Also

SGD : Stochastic Gradient Descent optimizer

__init__(params, lr=0.01, beta1=0.9, beta2=0.999, eps=1e-08, weight_decay=0.0)

Parameters:

  • params (list) –

    List of parameters to optimize.

  • lr (float, default: 0.01 ) –

    Learning rate, by default 0.01

  • beta1 (float, default: 0.9 ) –

    Exponential decay rate for first moment estimates, by default 0.9

  • beta2 (float, default: 0.999 ) –

    Exponential decay rate for second moment estimates, by default 0.999

  • eps (float, default: 1e-08 ) –

    Small constant for numerical stability, by default 1e-8

  • weight_decay (float, default: 0.0 ) –

    Weight decay (L2 penalty), by default 0.0

Optimizer

Base class for all optimizers.

This class defines the basic interface and functionality that all optimizer implementations should follow. It provides common methods like step() for parameter updates and reset_grad() for gradient reset.

Notes

All optimizers should inherit from this base class and implement the step() method according to their specific optimization algorithm.

See Also

SGD : Stochastic Gradient Descent optimizer

Adam : Adaptive Moment Estimation optimizer

__init__(params)

Parameters:

  • params (list) –

    List of parameters to optimize. Each parameter should be an instance of Tensor with requires_grad=True.

SGD

Bases: Optimizer

Stochastic Gradient Descent optimizer.

Implements stochastic gradient descent (optionally with momentum).

Notes

The update rule for parameter p with gradient g is:

With momentum:

u = momentum * u + (1 - momentum) * g

p = p * (1 - lr * weight_decay) - lr * u

Without momentum: p = p * (1 - lr * weight_decay) - lr * g

See Also

Adam : Adaptive Moment Estimation optimizer

__init__(params, lr=0.01, momentum=0.0, weight_decay=0.0)

Parameters:

  • params (list) –

    List of parameters to optimize. Each parameter should be an instance of Tensor with requires_grad=True.

  • lr (float, default: 0.01 ) –

    Learning rate. Default: 0.01

  • momentum (float, default: 0.0 ) –

    Momentum factor. Default: 0.0

  • weight_decay (float, default: 0.0 ) –

    Weight decay (L2 penalty). Default: 0.0

Notes

When momentum is 0, this is equivalent to standard stochastic gradient descent. When momentum > 0, this implements momentum-based gradient descent which helps accelerate gradients vectors in the right directions.