Skip to content

NN

Neural network module for tiny-pytorch implementation.

This module provides a comprehensive set of classes and functions for building neural networks. It includes fundamental building blocks like layers, activation functions, normalization modules, and specialized components for various types of neural network architectures.

The module is designed to work seamlessly with the automatic differentiation system, allowing for easy construction and training of complex neural networks. All modules inherit from the base Module class, providing consistent interfaces for parameter management, training mode control, and forward pass computation.

Key Features
  • Automatic parameter management and gradient tracking
  • Training/evaluation mode switching
  • Modular design for easy network construction
  • Support for various neural network architectures
  • Built-in activation functions and loss functions
  • Normalization layers (BatchNorm, LayerNorm)
  • Recurrent neural network components (RNN, LSTM)
  • Convolutional neural network layers and composite blocks
  • Embedding layers for sequence processing

Classes:

  • Module

    Base class for all neural network modules. Provides common functionality for parameter management, training mode control, and forward pass computation.

  • Parameter

    A special kind of tensor that represents learnable parameters. Acts as a marker so modules can identify trainable parameters. All Parameter tensors have require_grad set to True.

  • ReLU

    Rectified Linear Unit activation function.

  • Tanh

    Hyperbolic tangent activation function.

  • Sigmoid

    Sigmoid activation function.

  • Linear

    Linear transformation layer (fully connected layer).

  • Flatten

    Flattens the input tensor into a 2D tensor.

  • BatchNorm1d

    1D batch normalization layer for fully connected networks.

  • BatchNorm2d

    2D batch normalization layer for convolutional networks.

  • LayerNorm1d

    1D layer normalization layer.

  • Dropout

    Dropout layer for regularization during training.

  • Sequential

    Sequential container that applies modules in order.

  • Residual

    Residual connection that adds input to the output of a module.

  • SoftmaxLoss

    Softmax cross-entropy loss function.

  • Conv

    2D convolutional layer with support for padding and stride.

  • ConvBN

    Composite module combining convolution, batch normalization, and ReLU activation. Common building block in modern CNN architectures like ResNet.

  • RNNCell

    Single RNN cell with tanh or ReLU nonlinearity.

  • RNN

    Multi-layer RNN with tanh or ReLU nonlinearity.

  • LSTMCell

    Single LSTM cell with forget, input, and output gates.

  • LSTM

    Multi-layer LSTM network.

  • Embedding

    Embedding layer for converting indices to dense vectors.

Notes

All modules support automatic differentiation through the tensor system. Parameters are automatically tracked and gradients are computed during backward passes. The training mode affects the behavior of certain modules like Dropout and BatchNorm, which behave differently during training and evaluation.

The module system is designed to be composable, allowing complex networks to be built from simple building blocks. The Sequential and Residual containers provide convenient ways to combine multiple modules.

Examples:

>>> import tiny_pytorch as tp
>>>
>>> # Create a simple feedforward network
>>> model = tp.nn.Sequential(
...     tp.nn.Linear(784, 128),
...     tp.nn.ReLU(),
...     tp.nn.Dropout(0.5),
...     tp.nn.Linear(128, 10)
... )
>>>
>>> # Create a convolutional network
>>> conv_model = tp.nn.Sequential(
...     tp.nn.Conv(3, 64, kernel_size=3),
...     tp.nn.BatchNorm2d(64),
...     tp.nn.ReLU(),
...     tp.nn.Flatten(),
...     tp.nn.Linear(64 * 28 * 28, 10)
... )
>>>
>>> # Create an RNN for sequence processing
>>> rnn = tp.nn.RNN(input_size=100, hidden_size=64, num_layers=2)
>>>
>>> # Use the model
>>> x = tp.Tensor.randn(32, 784)  # batch_size=32, features=784
>>> output = model(x)  # Forward pass

BatchNorm1d

Bases: Module

Applies batch normalization to the input tensor.

Parameters:

  • dim (int) –

    Number of dimensions in the input tensor.

  • eps (float, default: 1e-05 ) –

    Value added to the denominator for numerical stability. Default is 1e-5.

  • momentum (float, default: 0.1 ) –

    Momentum for the moving average. Default is 0.1.

  • device (Device, default: None ) –

    Device on which to place the tensor. Default is CPU.

  • dtype (str, default: 'float32' ) –

    Data type of the tensor. Default is "float32".

Attributes:

  • dim (int) –

    Number of dimensions in the input tensor.

  • eps (float) –

    Value added to the denominator for numerical stability.

  • momentum (float) –

    Momentum for the moving average.

  • weight (Parameter) –

    Learnable weight parameter.

  • bias (Parameter) –

    Learnable bias parameter.

  • running_mean (Tensor) –

    Running mean of the input tensor.

  • running_var (Tensor) –

    Running variance of the input tensor.

Methods:

  • forward

    Applies batch normalization to the input tensor x.

BatchNorm2d

Bases: BatchNorm1d

Applies batch normalization to 2D input tensors.

This module applies batch normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift".

The input is expected to be in NCHW format (batch, channels, height, width). For each channel, this layer computes the mean and variance over the batch and spatial dimensions, then normalizes the input and applies learnable scale and shift parameters.

Parameters:

  • num_features (int) –

    Number of features/channels in the input tensor.

  • eps (float, default: 1e-05 ) –

    Value added to the denominator for numerical stability. Default is 1e-5.

  • momentum (float, default: 0.1 ) –

    Momentum for the moving average. Default is 0.1.

  • device (Device, default: None ) –

    Device on which to place the parameters. Default is None (uses default device).

  • dtype (str, default: 'float32' ) –

    Data type of the parameters. Default is "float32".

Attributes:

  • num_features (int) –

    Number of features/channels in the input tensor.

  • eps (float) –

    Value added to the denominator for numerical stability.

  • momentum (float) –

    Momentum for the moving average.

  • weight (Parameter) –

    Learnable weight parameter of shape (num_features,).

  • bias (Parameter) –

    Learnable bias parameter of shape (num_features,).

  • running_mean (Tensor) –

    Running mean of the input tensor of shape (num_features,).

  • running_var (Tensor) –

    Running variance of the input tensor of shape (num_features,).

Notes
  • Input is expected to be in NCHW format (batch, channels, height, width).
  • During training, this layer keeps a running estimate of its computed mean and variance, which is then used for normalization during evaluation.
  • The running estimates are kept with a default momentum of 0.1.
  • Internally converts to channel-last format for efficient computation, similar to PyTorch's implementation.

Examples:

>>> bn = BatchNorm2d(64)
>>> x = Tensor.randn(32, 64, 28, 28)  # batch_size=32, channels=64, height=28, width=28
>>> output = bn(x)  # shape: (32, 64, 28, 28)

__init__(num_features, eps=1e-05, momentum=0.1, device=None, dtype='float32')

Initialize the BatchNorm2d module.

Parameters:

  • num_features (int) –

    Number of features/channels in the input tensor.

  • eps (float, default: 1e-05 ) –

    Value added to the denominator for numerical stability. Default is 1e-5.

  • momentum (float, default: 0.1 ) –

    Momentum for the moving average. Default is 0.1.

  • device (Device, default: None ) –

    Device on which to place the parameters. Default is None (uses default device).

  • dtype (str, default: 'float32' ) –

    Data type of the parameters. Default is "float32".

forward(x)

Forward pass of the 2D batch normalization.

Parameters:

  • x (Tensor) –

    Input tensor of shape (batch_size, num_features, height, width) in NCHW format.

Returns:

  • Tensor

    Normalized tensor of shape (batch_size, num_features, height, width) in NCHW format.

Conv

Bases: Module

Multi-channel 2D convolutional layer.

This module applies a 2D convolution over an input signal composed of several input planes. The input is expected to be in NCHW format (batch, channels, height, width) and the output will also be in NCHW format.

Parameters:

  • in_channels (int) –

    Number of channels in the input image.

  • out_channels (int) –

    Number of channels produced by the convolution.

  • kernel_size (int or tuple of int) –

    Size of the convolving kernel. If a single int is provided, it is used for both height and width dimensions. Only square kernels are supported.

  • stride (int or tuple of int, default: 1 ) –

    Stride of the convolution. If a single int is provided, it is used for both height and width dimensions. Default is 1.

  • bias (bool, default: True ) –

    If True, adds a learnable bias to the output. Default is True.

  • device (Device, default: None ) –

    Device on which to place the weights. Default is None (uses default device).

  • dtype (str, default: 'float32' ) –

    Data type of the weights. Default is "float32".

Attributes:

  • in_channels (int) –

    Number of channels in the input image.

  • out_channels (int) –

    Number of channels produced by the convolution.

  • kernel_size (int) –

    Size of the convolving kernel (square kernel).

  • stride (int) –

    Stride of the convolution.

  • padding (int) –

    Padding added to both sides of the input. Automatically calculated as (kernel_size - 1) // 2 to maintain same output size.

  • weight (Parameter) –

    The learnable weights of the module of shape (kernel_size, kernel_size, in_channels, out_channels).

  • bias (Parameter or None) –

    The learnable bias of the module of shape (out_channels,). None if bias is False.

Notes
  • Only supports padding='same' (automatic padding to maintain output size).
  • No grouped convolution or dilation support.
  • Only supports square kernels.
  • Input and output are in NCHW format.

Examples:

>>> conv = Conv(3, 64, kernel_size=3, stride=1)
>>> x = Tensor.randn(1, 3, 32, 32)  # batch_size=1, channels=3, height=32, width=32
>>> output = conv(x)  # shape: (1, 64, 32, 32)

forward(x)

Forward pass of the 2D convolution.

Parameters:

  • x (Tensor) –

    Input tensor of shape (batch_size, in_channels, height, width) in NCHW format.

Returns:

  • Tensor

    Output tensor of shape (batch_size, out_channels, height, width) in NCHW format.

ConvBN

Bases: Module

A composite module that combines convolution, batch normalization, and ReLU activation.

This module is a common building block in convolutional neural networks, particularly in architectures like ResNet. It applies a 2D convolution followed by batch normalization and ReLU activation in sequence. This combination helps with training stability and convergence speed.

The module consists of three components applied in order: 1. Conv: 2D convolutional layer 2. BatchNorm2d: 2D batch normalization layer 3. ReLU: Rectified Linear Unit activation function

Parameters:

  • in_channels (int) –

    Number of channels in the input image.

  • out_channels (int) –

    Number of channels produced by the convolution.

  • kernel_size (int or tuple[int, int]) –

    Size of the convolving kernel. If a single int is provided, it is used for both height and width dimensions. Only square kernels are supported.

  • stride (int or tuple[int, int], default: 1 ) –

    Stride of the convolution. If a single int is provided, it is used for both height and width dimensions. Default is 1.

  • device (Device, default: None ) –

    Device on which to place the parameters. Default is None (uses default device).

Attributes:

  • conv (Conv) –

    The 2D convolutional layer.

  • bn (BatchNorm2d) –

    The 2D batch normalization layer.

  • relu (ReLU) –

    The ReLU activation function.

Notes
  • Input is expected to be in NCHW format (batch, channels, height, width).
  • Output maintains the same format as input.
  • The convolution uses padding='same' to maintain spatial dimensions.
  • Batch normalization is applied per-channel across the batch and spatial dimensions.
  • ReLU activation is applied element-wise after batch normalization.

Examples:

>>> convbn = ConvBN(3, 64, kernel_size=3, stride=1)
>>> x = Tensor.randn(32, 3, 28, 28)  # batch_size=32, channels=3, height=28, width=28
>>> output = convbn(x)  # shape: (32, 64, 28, 28)

__init__(in_channels, out_channels, kernel_size, stride=1, device=None)

Initialize the ConvBN module.

Parameters:

  • in_channels (int) –

    Number of channels in the input image.

  • out_channels (int) –

    Number of channels produced by the convolution.

  • kernel_size (int or tuple[int, int]) –

    Size of the convolving kernel. If a single int is provided, it is used for both height and width dimensions.

  • stride (int or tuple[int, int], default: 1 ) –

    Stride of the convolution. If a single int is provided, it is used for both height and width dimensions. Default is 1.

  • device (Device, default: None ) –

    Device on which to place the parameters. Default is None (uses default device).

forward(x)

Forward pass of the ConvBN module.

Applies convolution, batch normalization, and ReLU activation in sequence.

Parameters:

  • x (Tensor) –

    Input tensor of shape (batch_size, in_channels, height, width) in NCHW format.

Returns:

  • Tensor

    Output tensor of shape (batch_size, out_channels, height, width) in NCHW format. The output has been processed through convolution, batch normalization, and ReLU.

Dropout

Bases: Module

Applies dropout to the input tensor.

Parameters:

  • p (float, default: 0.5 ) –

    Probability of an element to be dropped. Default is 0.5.

Attributes:

  • p (float) –

    Probability of an element to be dropped.

Methods:

  • forward

    Applies dropout to the input tensor x.

Embedding

Bases: Module

A lookup table that stores embeddings of a fixed dictionary and size.

This module is often used to store word embeddings and retrieve them using indices. The input to the module is a list of indices, and the output is the corresponding word embeddings.

Parameters:

  • vocab_sz (int) –

    Size of the dictionary of embeddings (number of unique tokens).

  • embedding_dim (int) –

    The size of each embedding vector.

  • device (Device, default: None ) –

    Device on which to place the embedding weights. Default is None (uses default device).

  • dtype (str, default: 'float32' ) –

    Data type of the embedding weights. Default is "float32".

Attributes:

  • vocab_sz (int) –

    Size of the dictionary of embeddings.

  • embedding_dim (int) –

    The size of each embedding vector.

  • weight (Parameter) –

    The learnable embedding weights of shape (vocab_sz, embedding_dim). Initialized from N(0, 1) distribution.

Methods:

  • forward

    Maps word indices to embedding vectors.

Examples:

>>> embedding = Embedding(1000, 128)
>>> input_indices = Tensor([[1, 2, 3], [4, 5, 6]])  # shape: (seq_len, batch_size)
>>> output = embedding(input_indices)  # shape: (seq_len, batch_size, 128)

forward(x)

Maps word indices to embedding vectors.

This method converts input indices to one-hot vectors and then projects them to embedding vectors using the learned embedding weights.

Parameters:

  • x (Tensor) –

    Input tensor containing indices of shape (seq_len, batch_size). Each element should be an integer index in the range [0, vocab_sz).

Returns:

  • Tensor

    Output tensor of shape (seq_len, batch_size, embedding_dim) containing the corresponding embedding vectors for each input index.

Notes

The input indices are converted to one-hot vectors internally, then multiplied with the embedding weight matrix to produce the final embeddings.

Flatten

Bases: Module

Flattens the input tensor into a 2D tensor.

Parameters:

  • X (Tensor) –

    Input tensor to be flattened.

Returns:

  • Tensor

    Flattened tensor.

LSTM

Bases: Module

Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence.

Parameters:

  • input_size (int) –

    The number of expected features in the input x.

  • hidden_size (int) –

    The number of features in the hidden state h.

  • num_layers (int, default: 1 ) –

    Number of recurrent layers. Default is 1.

  • bias (bool, default: True ) –

    If False, then the layer does not use bias weights. Default is True.

  • device (Device, default: None ) –

    Device on which to place the weights. Default is None (uses default device).

  • dtype (str, default: 'float32' ) –

    Data type of the weights. Default is "float32".

Attributes:

  • lstm_cells (list of LSTMCell) –

    List of LSTMCell modules for each layer.

  • hidden_size (int) –

    The number of features in the hidden state h.

  • num_layers (int) –

    Number of recurrent layers.

  • device (Device or None) –

    Device on which the parameters are allocated.

  • dtype (str) –

    Data type of the parameters.

Methods:

  • forward

    Compute the output and final hidden and cell states for a batch of input sequences.

forward(X, h=None)

Compute the output and final hidden and cell states for a batch of input sequences.

Parameters:

  • X (Tensor) –

    Input tensor of shape (seq_len, batch_size, input_size) containing the features of the input sequence.

  • h (tuple of (Tensor, Tensor) or None, default: None ) –

    Tuple of (h0, c0), where each is a tensor of shape (num_layers, batch_size, hidden_size). If None, both default to zeros.

Returns:

  • output ( Tensor ) –

    Output tensor of shape (seq_len, batch_size, hidden_size) containing the output features (h_t) from the last layer of the LSTM, for each t.

  • (h_n, c_n) : tuple of Tensor

    Tuple of (h_n, c_n), each of shape (num_layers, batch_size, hidden_size) containing the final hidden and cell states for each element in the batch.

LSTMCell

Bases: Module

A long short-term memory (LSTM) cell.

Parameters:

  • input_size (int) –

    The number of expected features in the input X.

  • hidden_size (int) –

    The number of features in the hidden state h.

  • bias (bool, default: True ) –

    If False, then the layer does not use bias weights. Default is True.

  • device (Device, default: None ) –

    Device on which to place the weights. Default is None (uses default device).

  • dtype (str, default: 'float32' ) –

    Data type of the weights. Default is "float32".

Attributes:

  • W_ih (Parameter) –

    The learnable input-hidden weights, of shape (input_size, 4 * hidden_size).

  • W_hh (Parameter) –

    The learnable hidden-hidden weights, of shape (hidden_size, 4 * hidden_size).

  • bias_ih (Parameter or None) –

    The learnable input-hidden bias, of shape (4 * hidden_size,). None if bias is False.

  • bias_hh (Parameter or None) –

    The learnable hidden-hidden bias, of shape (4 * hidden_size,). None if bias is False.

  • hidden_size (int) –

    The number of features in the hidden state h.

  • device (Device or None) –

    Device on which the parameters are allocated.

  • dtype (str) –

    Data type of the parameters.

Methods:

  • forward

    Compute the next hidden and cell state given input X and previous states.

__init__(input_size, hidden_size, bias=True, device=None, dtype='float32')

A long short-term memory (LSTM) cell.

Parameters:

  • input_size (int) –

    The number of expected features in the input X.

  • hidden_size (int) –

    The number of features in the hidden state h.

  • bias (bool, default: True ) –

    If False, then the layer does not use bias weights. Default is True.

  • device (Device, default: None ) –

    Device on which to place the weights. Default is None (uses default device).

  • dtype (str, default: 'float32' ) –

    Data type of the weights. Default is "float32".

Attributes:

  • W_ih (Parameter) –

    The learnable input-hidden weights, of shape (input_size, 4 * hidden_size).

  • W_hh (Parameter) –

    The learnable hidden-hidden weights, of shape (hidden_size, 4 * hidden_size).

  • bias_ih (Parameter or None) –

    The learnable input-hidden bias, of shape (4 * hidden_size,). None if bias is False.

  • bias_hh (Parameter or None) –

    The learnable hidden-hidden bias, of shape (4 * hidden_size,). None if bias is False.

  • hidden_size (int) –

    The number of features in the hidden state h.

  • device (Device or None) –

    Device on which the parameters are allocated.

  • dtype (str) –

    Data type of the parameters.

Functions:

  • forward

    Compute the next hidden and cell state given input X and previous states.

forward(X, h=None)

Compute the next hidden and cell state for a batch of inputs.

Parameters:

  • X (Tensor) –

    Input tensor of shape (batch_size, input_size).

  • h (tuple of (Tensor, Tensor) or None, default: None ) –

    Tuple of (h0, c0), where each is a tensor of shape (batch_size, hidden_size). If None, both default to zeros.

Returns:

  • h_out ( Tensor ) –

    Next hidden state tensor of shape (batch_size, hidden_size).

  • c_out ( Tensor ) –

    Next cell state tensor of shape (batch_size, hidden_size).

LayerNorm1d

Bases: Module

Applies layer normalization to the input tensor.

Parameters:

  • x (Tensor) –

    Input tensor to apply layer normalization.

  • dim (int) –

    Dimension to normalize.

  • eps (float, default: 1e-05 ) –

    Epsilon for numerical stability. Default is 1e-5.

  • device (Device, default: None ) –

    Device on which to place the tensor. Default is CPU.

  • dtype (str, default: 'float32' ) –

    Data type of the tensor. Default is "float32".

Returns:

  • Tensor

    Normalized tensor.

Linear

Bases: Module

Applies a linear transformation to the input data.

Attributes:

  • weight (Tensor) –

    The learnable weights of the module of shape (in_features, out_features).

  • bias ((Tensor, optional)) –

    The learnable bias of the module of shape (1, out_features).

__init__(in_features, out_features, bias=True, device=None, dtype='float32')

Parameters:

  • in_features (int) –

    Size of each input sample.

  • out_features (int) –

    Size of each output sample.

  • bias (bool, default: True ) –

    If set to False, the layer will not learn an additive bias. Default is True.

  • device (Device, default: None ) –

    Device on which to place the tensor. Default is CPU.

  • dtype (str, default: 'float32' ) –

    Data type of the tensor. Default is "float32".

Module

Base class for all neural network modules. Your module should also subclass this.

Attributes:

  • training (bool) –

    Whether the module is in training mode or not.

__call__(*args, **kwargs)

Forward pass of the module.

Returns:

  • Tensor

    The output tensor of the forward pass.

children()

Return the list of child modules in the module.

Returns:

  • list[Module]

    List of child modules in the module.

eval()

Sets the module in evaluation mode.

This method sets the training attribute to False, which affects the behavior of certain modules like dropout and batch normalization. It also recursively sets the training attribute of all child modules.

Notes

This method is a no-op if the module is already in evaluation mode.

parameters()

Returns:

  • list[Tensor]

    A list of tensors representing the parameters of the module.

train()

Sets the module in training mode.

This method sets the training attribute to True, which affects the behavior of certain modules like dropout and batch normalization. It also recursively sets the training attribute of all child modules.

Notes

This method is a no-op if the module is already in training mode.

Parameter

Bases: Tensor

A special kind of tensor that represents parameters. It acts as a marker so modules can be able to identify learnable parameters. All Parameter tensors have require_grad set to True.

RNN

Bases: Module

Applies a multi-layer RNN with tanh or ReLU non-linearity to an input sequence.

Parameters:

  • input_size (int) –

    The number of expected features in the input x.

  • hidden_size (int) –

    The number of features in the hidden state h.

  • num_layers (int, default: 1 ) –

    Number of recurrent layers. Default is 1.

  • bias (bool, default: True ) –

    If False, then the layer does not use bias weights. Default is True.

  • nonlinearity (str, default: 'tanh' ) –

    The non-linearity to use. Can be either 'tanh' or 'relu'. Default is 'tanh'.

  • device (Device, default: None ) –

    Device on which to place the weights. Default is None (uses default device).

  • dtype (str, default: 'float32' ) –

    Data type of the weights. Default is "float32".

Attributes:

  • rnn_cells (list of RNNCell) –

    List of RNNCell modules for each layer.

  • hidden_size (int) –

    The number of features in the hidden state h.

  • num_layers (int) –

    Number of recurrent layers.

  • device (Device or None) –

    Device on which the parameters are allocated.

  • dtype (str) –

    Data type of the parameters.

Methods:

  • forward

    Compute the output and final hidden state for a batch of input sequences.

__init__(input_size, hidden_size, num_layers=1, bias=True, nonlinearity='tanh', device=None, dtype='float32')

Applies an RNN cell with tanh or ReLU nonlinearity.

Parameters: input_size: The number of expected features in the input X hidden_size: The number of features in the hidden state h bias: If False, then the layer does not use bias weights nonlinearity: The non-linearity to use. Can be either 'tanh' or 'relu'.

Variables: W_ih: The learnable input-hidden weights of shape (input_size, hidden_size). W_hh: The learnable hidden-hidden weights of shape (hidden_size, hidden_size). bias_ih: The learnable input-hidden bias of shape (hidden_size,). bias_hh: The learnable hidden-hidden bias of shape (hidden_size,).

Weights and biases are initialized from U(-sqrt(k), sqrt(k)) where k = 1/hidden_size

forward(X, h0=None)

Compute the output and final hidden state for a batch of input sequences.

Parameters:

  • X (Tensor) –

    Input tensor of shape (seq_len, batch_size, input_size) containing the features of the input sequence.

  • h0 (Tensor or None, default: None ) –

    Initial hidden state for each element in the batch, of shape (num_layers, batch_size, hidden_size). If None, defaults to zeros.

Returns:

  • output ( Tensor ) –

    Output tensor of shape (seq_len, batch_size, hidden_size) containing the output features (h_t) from the last layer of the RNN, for each t.

  • h_n ( Tensor ) –

    Tensor of shape (num_layers, batch_size, hidden_size) containing the final hidden state for each element in the batch.

RNNCell

Bases: Module

Applies a single RNN cell with a specified nonlinearity (tanh or ReLU).

Parameters:

  • input_size (int) –

    The number of expected features in the input X.

  • hidden_size (int) –

    The number of features in the hidden state h.

  • bias (bool, default: True ) –

    If False, then the layer does not use bias weights. Default is True.

  • nonlinearity (str, default: 'tanh' ) –

    The non-linearity to use. Can be either 'tanh' or 'relu'. Default is 'tanh'.

  • device (Device, default: None ) –

    Device on which to place the weights. Default is None (uses default device).

  • dtype (str, default: 'float32' ) –

    Data type of the weights. Default is "float32".

Attributes:

  • W_ih (Parameter) –

    The learnable input-hidden weights of shape (input_size, hidden_size).

  • W_hh (Parameter) –

    The learnable hidden-hidden weights of shape (hidden_size, hidden_size).

  • bias_ih (Parameter or None) –

    The learnable input-hidden bias of shape (hidden_size,). None if bias is False.

  • bias_hh (Parameter or None) –

    The learnable hidden-hidden bias of shape (hidden_size,). None if bias is False.

  • nonlinearity (Module) –

    The nonlinearity module (Tanh or ReLU).

  • device (Device or None) –

    Device on which the parameters are allocated.

  • dtype (str) –

    Data type of the parameters.

  • hidden_size (int) –

    The number of features in the hidden state h.

Methods:

  • forward

    Compute the next hidden state given input X and previous hidden state h.

__init__(input_size, hidden_size, bias=True, nonlinearity='tanh', device=None, dtype='float32')

Applies an RNN cell with tanh or ReLU nonlinearity.

Parameters: input_size: The number of expected features in the input X hidden_size: The number of features in the hidden state h bias: If False, then the layer does not use bias weights nonlinearity: The non-linearity to use. Can be either 'tanh' or 'relu'.

Variables: W_ih: The learnable input-hidden weights of shape (input_size, hidden_size). W_hh: The learnable hidden-hidden weights of shape (hidden_size, hidden_size). bias_ih: The learnable input-hidden bias of shape (hidden_size,). bias_hh: The learnable hidden-hidden bias of shape (hidden_size,).

Weights and biases are initialized from U(-sqrt(k), sqrt(k)) where k = 1/hidden_size

forward(X, h=None)

Compute the next hidden state for a batch of inputs.

Parameters:

  • X (Tensor) –

    Input tensor of shape (batch_size, input_size).

  • h (Tensor or None, default: None ) –

    Initial hidden state for each element in the batch, of shape (batch_size, hidden_size). If None, defaults to zeros.

Returns:

  • Tensor

    Next hidden state tensor of shape (batch_size, hidden_size).

ReLU

Bases: Module

Applies the rectified linear unit (ReLU) activation function element-wise.

Parameters:

  • x (Tensor) –

    Input tensor.

Returns:

  • Tensor

    Output tensor with ReLU activation applied element-wise.

Residual

Bases: Module

Applies a residual connection to the input tensor.

Parameters:

  • fn (Module) –

    The module to apply before adding the residual connection.

Attributes:

  • fn (Module) –

    The module to apply before adding the residual connection.

Methods:

  • forward

    Applies the residual connection to the input tensor x.

Sequential

Bases: Module

Applies a sequence of modules to the input.

Parameters:

  • *modules (Module, default: () ) –

    A sequence of modules to apply to the input.

Returns:

  • Tensor

    The output tensor after applying all modules in sequence.

Sigmoid

Bases: Module

Applies the sigmoid activation function element-wise.

The sigmoid function maps any real-valued number to the range (0, 1). It is defined as: sigmoid(x) = 1 / (1 + e^(-x))

The sigmoid function is commonly used in binary classification problems and as a gating mechanism in neural networks.

Attributes:

  • None

    This module has no learnable parameters.

Examples:

>>> sigmoid = Sigmoid()
>>> x = Tensor([-2.0, -1.0, 0.0, 1.0, 2.0])
>>> output = sigmoid(x)
>>> print(output)
Tensor([0.1192, 0.2689, 0.5000, 0.7311, 0.8808], device=cpu_numpy())

__init__()

Initialize the Sigmoid module.

This module has no learnable parameters and requires no initialization.

forward(x)

Forward pass of the sigmoid activation function.

Parameters:

  • x (Tensor) –

    Input tensor of any shape.

Returns:

  • Tensor

    Output tensor with the same shape as input, with sigmoid activation applied element-wise. Values are in the range (0, 1).

SoftmaxLoss

Bases: Module

Computes the softmax loss between logits and labels.

Parameters:

  • logits (Tensor) –

    Input logits tensor.

  • y (Tensor) –

    Ground truth labels tensor.

Returns:

  • Tensor

    The softmax loss between logits and labels.

Tanh

Bases: Module

Applies the hyperbolic tangent (tanh) activation function element-wise.

The tanh function maps any real-valued number to the range (-1, 1). It is defined as: tanh(x) = (e^x - e^(-x)) / (e^x + e^(-x))

Attributes:

  • None

    This module has no learnable parameters.

Examples:

>>> tanh = Tanh()
>>> x = Tensor([-2.0, -1.0, 0.0, 1.0, 2.0])
>>> output = tanh(x)
>>> print(output)
Tensor([-0.9640, -0.7616, 0.0000, 0.7616, 0.9640], device=cpu_numpy())

forward(x)

Forward pass of the tanh activation function.

Parameters:

  • x (Tensor) –

    Input tensor of any shape.

Returns:

  • Tensor

    Output tensor with the same shape as input, with tanh activation applied element-wise. Values are in the range (-1, 1).