NN
Neural network module for tiny-pytorch implementation.
This module provides a comprehensive set of classes and functions for building neural networks. It includes fundamental building blocks like layers, activation functions, normalization modules, and specialized components for various types of neural network architectures.
The module is designed to work seamlessly with the automatic differentiation
system, allowing for easy construction and training of complex neural networks.
All modules inherit from the base Module class, providing consistent interfaces
for parameter management, training mode control, and forward pass computation.
Key Features
- Automatic parameter management and gradient tracking
- Training/evaluation mode switching
- Modular design for easy network construction
- Support for various neural network architectures
- Built-in activation functions and loss functions
- Normalization layers (BatchNorm, LayerNorm)
- Recurrent neural network components (RNN, LSTM)
- Convolutional neural network layers and composite blocks
- Embedding layers for sequence processing
Classes:
-
Module–Base class for all neural network modules. Provides common functionality for parameter management, training mode control, and forward pass computation.
-
Parameter–A special kind of tensor that represents learnable parameters. Acts as a marker so modules can identify trainable parameters. All Parameter tensors have require_grad set to True.
-
ReLU–Rectified Linear Unit activation function.
-
Tanh–Hyperbolic tangent activation function.
-
Sigmoid–Sigmoid activation function.
-
Linear–Linear transformation layer (fully connected layer).
-
Flatten–Flattens the input tensor into a 2D tensor.
-
BatchNorm1d–1D batch normalization layer for fully connected networks.
-
BatchNorm2d–2D batch normalization layer for convolutional networks.
-
LayerNorm1d–1D layer normalization layer.
-
Dropout–Dropout layer for regularization during training.
-
Sequential–Sequential container that applies modules in order.
-
Residual–Residual connection that adds input to the output of a module.
-
SoftmaxLoss–Softmax cross-entropy loss function.
-
Conv–2D convolutional layer with support for padding and stride.
-
ConvBN–Composite module combining convolution, batch normalization, and ReLU activation. Common building block in modern CNN architectures like ResNet.
-
RNNCell–Single RNN cell with tanh or ReLU nonlinearity.
-
RNN–Multi-layer RNN with tanh or ReLU nonlinearity.
-
LSTMCell–Single LSTM cell with forget, input, and output gates.
-
LSTM–Multi-layer LSTM network.
-
Embedding–Embedding layer for converting indices to dense vectors.
Notes
All modules support automatic differentiation through the tensor system. Parameters are automatically tracked and gradients are computed during backward passes. The training mode affects the behavior of certain modules like Dropout and BatchNorm, which behave differently during training and evaluation.
The module system is designed to be composable, allowing complex networks to be built from simple building blocks. The Sequential and Residual containers provide convenient ways to combine multiple modules.
Examples:
>>> import tiny_pytorch as tp
>>>
>>> # Create a simple feedforward network
>>> model = tp.nn.Sequential(
... tp.nn.Linear(784, 128),
... tp.nn.ReLU(),
... tp.nn.Dropout(0.5),
... tp.nn.Linear(128, 10)
... )
>>>
>>> # Create a convolutional network
>>> conv_model = tp.nn.Sequential(
... tp.nn.Conv(3, 64, kernel_size=3),
... tp.nn.BatchNorm2d(64),
... tp.nn.ReLU(),
... tp.nn.Flatten(),
... tp.nn.Linear(64 * 28 * 28, 10)
... )
>>>
>>> # Create an RNN for sequence processing
>>> rnn = tp.nn.RNN(input_size=100, hidden_size=64, num_layers=2)
>>>
>>> # Use the model
>>> x = tp.Tensor.randn(32, 784) # batch_size=32, features=784
>>> output = model(x) # Forward pass
BatchNorm1d
Bases: Module
Applies batch normalization to the input tensor.
Parameters:
-
dim(int) –Number of dimensions in the input tensor.
-
eps(float, default:1e-05) –Value added to the denominator for numerical stability. Default is 1e-5.
-
momentum(float, default:0.1) –Momentum for the moving average. Default is 0.1.
-
device(Device, default:None) –Device on which to place the tensor. Default is CPU.
-
dtype(str, default:'float32') –Data type of the tensor. Default is "float32".
Attributes:
-
dim(int) –Number of dimensions in the input tensor.
-
eps(float) –Value added to the denominator for numerical stability.
-
momentum(float) –Momentum for the moving average.
-
weight(Parameter) –Learnable weight parameter.
-
bias(Parameter) –Learnable bias parameter.
-
running_mean(Tensor) –Running mean of the input tensor.
-
running_var(Tensor) –Running variance of the input tensor.
Methods:
-
forward–Applies batch normalization to the input tensor
x.
BatchNorm2d
Bases: BatchNorm1d
Applies batch normalization to 2D input tensors.
This module applies batch normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift".
The input is expected to be in NCHW format (batch, channels, height, width). For each channel, this layer computes the mean and variance over the batch and spatial dimensions, then normalizes the input and applies learnable scale and shift parameters.
Parameters:
-
num_features(int) –Number of features/channels in the input tensor.
-
eps(float, default:1e-05) –Value added to the denominator for numerical stability. Default is 1e-5.
-
momentum(float, default:0.1) –Momentum for the moving average. Default is 0.1.
-
device(Device, default:None) –Device on which to place the parameters. Default is None (uses default device).
-
dtype(str, default:'float32') –Data type of the parameters. Default is "float32".
Attributes:
-
num_features(int) –Number of features/channels in the input tensor.
-
eps(float) –Value added to the denominator for numerical stability.
-
momentum(float) –Momentum for the moving average.
-
weight(Parameter) –Learnable weight parameter of shape (num_features,).
-
bias(Parameter) –Learnable bias parameter of shape (num_features,).
-
running_mean(Tensor) –Running mean of the input tensor of shape (num_features,).
-
running_var(Tensor) –Running variance of the input tensor of shape (num_features,).
Notes
- Input is expected to be in NCHW format (batch, channels, height, width).
- During training, this layer keeps a running estimate of its computed mean and variance, which is then used for normalization during evaluation.
- The running estimates are kept with a default momentum of 0.1.
- Internally converts to channel-last format for efficient computation, similar to PyTorch's implementation.
Examples:
>>> bn = BatchNorm2d(64)
>>> x = Tensor.randn(32, 64, 28, 28) # batch_size=32, channels=64, height=28, width=28
>>> output = bn(x) # shape: (32, 64, 28, 28)
__init__(num_features, eps=1e-05, momentum=0.1, device=None, dtype='float32')
Initialize the BatchNorm2d module.
Parameters:
-
num_features(int) –Number of features/channels in the input tensor.
-
eps(float, default:1e-05) –Value added to the denominator for numerical stability. Default is 1e-5.
-
momentum(float, default:0.1) –Momentum for the moving average. Default is 0.1.
-
device(Device, default:None) –Device on which to place the parameters. Default is None (uses default device).
-
dtype(str, default:'float32') –Data type of the parameters. Default is "float32".
Conv
Bases: Module
Multi-channel 2D convolutional layer.
This module applies a 2D convolution over an input signal composed of several input planes. The input is expected to be in NCHW format (batch, channels, height, width) and the output will also be in NCHW format.
Parameters:
-
in_channels(int) –Number of channels in the input image.
-
out_channels(int) –Number of channels produced by the convolution.
-
kernel_size(int or tuple of int) –Size of the convolving kernel. If a single int is provided, it is used for both height and width dimensions. Only square kernels are supported.
-
stride(int or tuple of int, default:1) –Stride of the convolution. If a single int is provided, it is used for both height and width dimensions. Default is 1.
-
bias(bool, default:True) –If True, adds a learnable bias to the output. Default is True.
-
device(Device, default:None) –Device on which to place the weights. Default is None (uses default device).
-
dtype(str, default:'float32') –Data type of the weights. Default is "float32".
Attributes:
-
in_channels(int) –Number of channels in the input image.
-
out_channels(int) –Number of channels produced by the convolution.
-
kernel_size(int) –Size of the convolving kernel (square kernel).
-
stride(int) –Stride of the convolution.
-
padding(int) –Padding added to both sides of the input. Automatically calculated as (kernel_size - 1) // 2 to maintain same output size.
-
weight(Parameter) –The learnable weights of the module of shape (kernel_size, kernel_size, in_channels, out_channels).
-
bias(Parameter or None) –The learnable bias of the module of shape (out_channels,). None if bias is False.
Notes
- Only supports padding='same' (automatic padding to maintain output size).
- No grouped convolution or dilation support.
- Only supports square kernels.
- Input and output are in NCHW format.
Examples:
>>> conv = Conv(3, 64, kernel_size=3, stride=1)
>>> x = Tensor.randn(1, 3, 32, 32) # batch_size=1, channels=3, height=32, width=32
>>> output = conv(x) # shape: (1, 64, 32, 32)
ConvBN
Bases: Module
A composite module that combines convolution, batch normalization, and ReLU activation.
This module is a common building block in convolutional neural networks, particularly in architectures like ResNet. It applies a 2D convolution followed by batch normalization and ReLU activation in sequence. This combination helps with training stability and convergence speed.
The module consists of three components applied in order: 1. Conv: 2D convolutional layer 2. BatchNorm2d: 2D batch normalization layer 3. ReLU: Rectified Linear Unit activation function
Parameters:
-
in_channels(int) –Number of channels in the input image.
-
out_channels(int) –Number of channels produced by the convolution.
-
kernel_size(int or tuple[int, int]) –Size of the convolving kernel. If a single int is provided, it is used for both height and width dimensions. Only square kernels are supported.
-
stride(int or tuple[int, int], default:1) –Stride of the convolution. If a single int is provided, it is used for both height and width dimensions. Default is 1.
-
device(Device, default:None) –Device on which to place the parameters. Default is None (uses default device).
Attributes:
-
conv(Conv) –The 2D convolutional layer.
-
bn(BatchNorm2d) –The 2D batch normalization layer.
-
relu(ReLU) –The ReLU activation function.
Notes
- Input is expected to be in NCHW format (batch, channels, height, width).
- Output maintains the same format as input.
- The convolution uses padding='same' to maintain spatial dimensions.
- Batch normalization is applied per-channel across the batch and spatial dimensions.
- ReLU activation is applied element-wise after batch normalization.
Examples:
>>> convbn = ConvBN(3, 64, kernel_size=3, stride=1)
>>> x = Tensor.randn(32, 3, 28, 28) # batch_size=32, channels=3, height=28, width=28
>>> output = convbn(x) # shape: (32, 64, 28, 28)
__init__(in_channels, out_channels, kernel_size, stride=1, device=None)
Initialize the ConvBN module.
Parameters:
-
in_channels(int) –Number of channels in the input image.
-
out_channels(int) –Number of channels produced by the convolution.
-
kernel_size(int or tuple[int, int]) –Size of the convolving kernel. If a single int is provided, it is used for both height and width dimensions.
-
stride(int or tuple[int, int], default:1) –Stride of the convolution. If a single int is provided, it is used for both height and width dimensions. Default is 1.
-
device(Device, default:None) –Device on which to place the parameters. Default is None (uses default device).
forward(x)
Forward pass of the ConvBN module.
Applies convolution, batch normalization, and ReLU activation in sequence.
Parameters:
-
x(Tensor) –Input tensor of shape (batch_size, in_channels, height, width) in NCHW format.
Returns:
-
Tensor–Output tensor of shape (batch_size, out_channels, height, width) in NCHW format. The output has been processed through convolution, batch normalization, and ReLU.
Dropout
Bases: Module
Applies dropout to the input tensor.
Parameters:
-
p(float, default:0.5) –Probability of an element to be dropped. Default is 0.5.
Attributes:
-
p(float) –Probability of an element to be dropped.
Methods:
-
forward–Applies dropout to the input tensor
x.
Embedding
Bases: Module
A lookup table that stores embeddings of a fixed dictionary and size.
This module is often used to store word embeddings and retrieve them using indices. The input to the module is a list of indices, and the output is the corresponding word embeddings.
Parameters:
-
vocab_sz(int) –Size of the dictionary of embeddings (number of unique tokens).
-
embedding_dim(int) –The size of each embedding vector.
-
device(Device, default:None) –Device on which to place the embedding weights. Default is None (uses default device).
-
dtype(str, default:'float32') –Data type of the embedding weights. Default is "float32".
Attributes:
-
vocab_sz(int) –Size of the dictionary of embeddings.
-
embedding_dim(int) –The size of each embedding vector.
-
weight(Parameter) –The learnable embedding weights of shape
(vocab_sz, embedding_dim). Initialized from N(0, 1) distribution.
Methods:
-
forward–Maps word indices to embedding vectors.
Examples:
>>> embedding = Embedding(1000, 128)
>>> input_indices = Tensor([[1, 2, 3], [4, 5, 6]]) # shape: (seq_len, batch_size)
>>> output = embedding(input_indices) # shape: (seq_len, batch_size, 128)
forward(x)
Maps word indices to embedding vectors.
This method converts input indices to one-hot vectors and then projects them to embedding vectors using the learned embedding weights.
Parameters:
-
x(Tensor) –Input tensor containing indices of shape
(seq_len, batch_size). Each element should be an integer index in the range [0, vocab_sz).
Returns:
-
Tensor–Output tensor of shape
(seq_len, batch_size, embedding_dim)containing the corresponding embedding vectors for each input index.
Notes
The input indices are converted to one-hot vectors internally, then multiplied with the embedding weight matrix to produce the final embeddings.
Flatten
LSTM
Bases: Module
Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence.
Parameters:
-
input_size(int) –The number of expected features in the input x.
-
hidden_size(int) –The number of features in the hidden state h.
-
num_layers(int, default:1) –Number of recurrent layers. Default is 1.
-
bias(bool, default:True) –If False, then the layer does not use bias weights. Default is True.
-
device(Device, default:None) –Device on which to place the weights. Default is None (uses default device).
-
dtype(str, default:'float32') –Data type of the weights. Default is "float32".
Attributes:
-
lstm_cells(list of LSTMCell) –List of LSTMCell modules for each layer.
-
hidden_size(int) –The number of features in the hidden state h.
-
num_layers(int) –Number of recurrent layers.
-
device(Device or None) –Device on which the parameters are allocated.
-
dtype(str) –Data type of the parameters.
Methods:
-
forward–Compute the output and final hidden and cell states for a batch of input sequences.
forward(X, h=None)
Compute the output and final hidden and cell states for a batch of input sequences.
Parameters:
-
X(Tensor) –Input tensor of shape (seq_len, batch_size, input_size) containing the features of the input sequence.
-
h(tuple of (Tensor, Tensor) or None, default:None) –Tuple of (h0, c0), where each is a tensor of shape (num_layers, batch_size, hidden_size). If None, both default to zeros.
Returns:
-
output(Tensor) –Output tensor of shape (seq_len, batch_size, hidden_size) containing the output features (h_t) from the last layer of the LSTM, for each t.
-
(h_n, c_n) : tuple of Tensor–Tuple of (h_n, c_n), each of shape (num_layers, batch_size, hidden_size) containing the final hidden and cell states for each element in the batch.
LSTMCell
Bases: Module
A long short-term memory (LSTM) cell.
Parameters:
-
input_size(int) –The number of expected features in the input X.
-
hidden_size(int) –The number of features in the hidden state h.
-
bias(bool, default:True) –If False, then the layer does not use bias weights. Default is True.
-
device(Device, default:None) –Device on which to place the weights. Default is None (uses default device).
-
dtype(str, default:'float32') –Data type of the weights. Default is "float32".
Attributes:
-
W_ih(Parameter) –The learnable input-hidden weights, of shape (input_size, 4 * hidden_size).
-
W_hh(Parameter) –The learnable hidden-hidden weights, of shape (hidden_size, 4 * hidden_size).
-
bias_ih(Parameter or None) –The learnable input-hidden bias, of shape (4 * hidden_size,). None if bias is False.
-
bias_hh(Parameter or None) –The learnable hidden-hidden bias, of shape (4 * hidden_size,). None if bias is False.
-
hidden_size(int) –The number of features in the hidden state h.
-
device(Device or None) –Device on which the parameters are allocated.
-
dtype(str) –Data type of the parameters.
Methods:
-
forward–Compute the next hidden and cell state given input X and previous states.
__init__(input_size, hidden_size, bias=True, device=None, dtype='float32')
A long short-term memory (LSTM) cell.
Parameters:
-
input_size(int) –The number of expected features in the input X.
-
hidden_size(int) –The number of features in the hidden state h.
-
bias(bool, default:True) –If False, then the layer does not use bias weights. Default is True.
-
device(Device, default:None) –Device on which to place the weights. Default is None (uses default device).
-
dtype(str, default:'float32') –Data type of the weights. Default is "float32".
Attributes:
-
W_ih(Parameter) –The learnable input-hidden weights, of shape (input_size, 4 * hidden_size).
-
W_hh(Parameter) –The learnable hidden-hidden weights, of shape (hidden_size, 4 * hidden_size).
-
bias_ih(Parameter or None) –The learnable input-hidden bias, of shape (4 * hidden_size,). None if bias is False.
-
bias_hh(Parameter or None) –The learnable hidden-hidden bias, of shape (4 * hidden_size,). None if bias is False.
-
hidden_size(int) –The number of features in the hidden state h.
-
device(Device or None) –Device on which the parameters are allocated.
-
dtype(str) –Data type of the parameters.
Functions:
-
forward–Compute the next hidden and cell state given input X and previous states.
forward(X, h=None)
Compute the next hidden and cell state for a batch of inputs.
Parameters:
-
X(Tensor) –Input tensor of shape (batch_size, input_size).
-
h(tuple of (Tensor, Tensor) or None, default:None) –Tuple of (h0, c0), where each is a tensor of shape (batch_size, hidden_size). If None, both default to zeros.
Returns:
LayerNorm1d
Bases: Module
Applies layer normalization to the input tensor.
Parameters:
-
x(Tensor) –Input tensor to apply layer normalization.
-
dim(int) –Dimension to normalize.
-
eps(float, default:1e-05) –Epsilon for numerical stability. Default is 1e-5.
-
device(Device, default:None) –Device on which to place the tensor. Default is CPU.
-
dtype(str, default:'float32') –Data type of the tensor. Default is "float32".
Returns:
-
Tensor–Normalized tensor.
Linear
Bases: Module
Applies a linear transformation to the input data.
Attributes:
-
weight(Tensor) –The learnable weights of the module of shape
(in_features, out_features). -
bias((Tensor, optional)) –The learnable bias of the module of shape
(1, out_features).
__init__(in_features, out_features, bias=True, device=None, dtype='float32')
Parameters:
-
in_features(int) –Size of each input sample.
-
out_features(int) –Size of each output sample.
-
bias(bool, default:True) –If set to
False, the layer will not learn an additive bias. Default isTrue. -
device(Device, default:None) –Device on which to place the tensor. Default is CPU.
-
dtype(str, default:'float32') –Data type of the tensor. Default is "float32".
Module
Base class for all neural network modules. Your module should also subclass this.
Attributes:
-
training(bool) –Whether the module is in training mode or not.
__call__(*args, **kwargs)
children()
Return the list of child modules in the module.
Returns:
-
list[Module]–List of child modules in the module.
eval()
Sets the module in evaluation mode.
This method sets the training attribute to False, which affects the behavior of certain modules like dropout and batch normalization. It also recursively sets the training attribute of all child modules.
Notes
This method is a no-op if the module is already in evaluation mode.
parameters()
Returns:
-
list[Tensor]–A list of tensors representing the parameters of the module.
train()
Sets the module in training mode.
This method sets the training attribute to True, which affects the behavior of certain modules like dropout and batch normalization. It also recursively sets the training attribute of all child modules.
Notes
This method is a no-op if the module is already in training mode.
Parameter
Bases: Tensor
A special kind of tensor that represents parameters. It acts as a marker
so modules can be able to identify learnable parameters. All Parameter
tensors have require_grad set to True.
RNN
Bases: Module
Applies a multi-layer RNN with tanh or ReLU non-linearity to an input sequence.
Parameters:
-
input_size(int) –The number of expected features in the input x.
-
hidden_size(int) –The number of features in the hidden state h.
-
num_layers(int, default:1) –Number of recurrent layers. Default is 1.
-
bias(bool, default:True) –If False, then the layer does not use bias weights. Default is True.
-
nonlinearity(str, default:'tanh') –The non-linearity to use. Can be either 'tanh' or 'relu'. Default is 'tanh'.
-
device(Device, default:None) –Device on which to place the weights. Default is None (uses default device).
-
dtype(str, default:'float32') –Data type of the weights. Default is "float32".
Attributes:
-
rnn_cells(list of RNNCell) –List of RNNCell modules for each layer.
-
hidden_size(int) –The number of features in the hidden state h.
-
num_layers(int) –Number of recurrent layers.
-
device(Device or None) –Device on which the parameters are allocated.
-
dtype(str) –Data type of the parameters.
Methods:
-
forward–Compute the output and final hidden state for a batch of input sequences.
__init__(input_size, hidden_size, num_layers=1, bias=True, nonlinearity='tanh', device=None, dtype='float32')
Applies an RNN cell with tanh or ReLU nonlinearity.
Parameters: input_size: The number of expected features in the input X hidden_size: The number of features in the hidden state h bias: If False, then the layer does not use bias weights nonlinearity: The non-linearity to use. Can be either 'tanh' or 'relu'.
Variables: W_ih: The learnable input-hidden weights of shape (input_size, hidden_size). W_hh: The learnable hidden-hidden weights of shape (hidden_size, hidden_size). bias_ih: The learnable input-hidden bias of shape (hidden_size,). bias_hh: The learnable hidden-hidden bias of shape (hidden_size,).
Weights and biases are initialized from U(-sqrt(k), sqrt(k)) where k = 1/hidden_size
forward(X, h0=None)
Compute the output and final hidden state for a batch of input sequences.
Parameters:
-
X(Tensor) –Input tensor of shape (seq_len, batch_size, input_size) containing the features of the input sequence.
-
h0(Tensor or None, default:None) –Initial hidden state for each element in the batch, of shape (num_layers, batch_size, hidden_size). If None, defaults to zeros.
Returns:
RNNCell
Bases: Module
Applies a single RNN cell with a specified nonlinearity (tanh or ReLU).
Parameters:
-
input_size(int) –The number of expected features in the input X.
-
hidden_size(int) –The number of features in the hidden state h.
-
bias(bool, default:True) –If False, then the layer does not use bias weights. Default is True.
-
nonlinearity(str, default:'tanh') –The non-linearity to use. Can be either 'tanh' or 'relu'. Default is 'tanh'.
-
device(Device, default:None) –Device on which to place the weights. Default is None (uses default device).
-
dtype(str, default:'float32') –Data type of the weights. Default is "float32".
Attributes:
-
W_ih(Parameter) –The learnable input-hidden weights of shape (input_size, hidden_size).
-
W_hh(Parameter) –The learnable hidden-hidden weights of shape (hidden_size, hidden_size).
-
bias_ih(Parameter or None) –The learnable input-hidden bias of shape (hidden_size,). None if bias is False.
-
bias_hh(Parameter or None) –The learnable hidden-hidden bias of shape (hidden_size,). None if bias is False.
-
nonlinearity(Module) –The nonlinearity module (Tanh or ReLU).
-
device(Device or None) –Device on which the parameters are allocated.
-
dtype(str) –Data type of the parameters.
-
hidden_size(int) –The number of features in the hidden state h.
Methods:
-
forward–Compute the next hidden state given input X and previous hidden state h.
__init__(input_size, hidden_size, bias=True, nonlinearity='tanh', device=None, dtype='float32')
Applies an RNN cell with tanh or ReLU nonlinearity.
Parameters: input_size: The number of expected features in the input X hidden_size: The number of features in the hidden state h bias: If False, then the layer does not use bias weights nonlinearity: The non-linearity to use. Can be either 'tanh' or 'relu'.
Variables: W_ih: The learnable input-hidden weights of shape (input_size, hidden_size). W_hh: The learnable hidden-hidden weights of shape (hidden_size, hidden_size). bias_ih: The learnable input-hidden bias of shape (hidden_size,). bias_hh: The learnable hidden-hidden bias of shape (hidden_size,).
Weights and biases are initialized from U(-sqrt(k), sqrt(k)) where k = 1/hidden_size
forward(X, h=None)
Compute the next hidden state for a batch of inputs.
Parameters:
-
X(Tensor) –Input tensor of shape (batch_size, input_size).
-
h(Tensor or None, default:None) –Initial hidden state for each element in the batch, of shape (batch_size, hidden_size). If None, defaults to zeros.
Returns:
-
Tensor–Next hidden state tensor of shape (batch_size, hidden_size).
ReLU
Residual
Bases: Module
Applies a residual connection to the input tensor.
Parameters:
-
fn(Module) –The module to apply before adding the residual connection.
Attributes:
-
fn(Module) –The module to apply before adding the residual connection.
Methods:
-
forward–Applies the residual connection to the input tensor
x.
Sequential
Sigmoid
Bases: Module
Applies the sigmoid activation function element-wise.
The sigmoid function maps any real-valued number to the range (0, 1). It is defined as: sigmoid(x) = 1 / (1 + e^(-x))
The sigmoid function is commonly used in binary classification problems and as a gating mechanism in neural networks.
Attributes:
-
None–This module has no learnable parameters.
Examples:
>>> sigmoid = Sigmoid()
>>> x = Tensor([-2.0, -1.0, 0.0, 1.0, 2.0])
>>> output = sigmoid(x)
>>> print(output)
Tensor([0.1192, 0.2689, 0.5000, 0.7311, 0.8808], device=cpu_numpy())
__init__()
Initialize the Sigmoid module.
This module has no learnable parameters and requires no initialization.
SoftmaxLoss
Tanh
Bases: Module
Applies the hyperbolic tangent (tanh) activation function element-wise.
The tanh function maps any real-valued number to the range (-1, 1). It is defined as: tanh(x) = (e^x - e^(-x)) / (e^x + e^(-x))
Attributes:
-
None–This module has no learnable parameters.
Examples:
>>> tanh = Tanh()
>>> x = Tensor([-2.0, -1.0, 0.0, 1.0, 2.0])
>>> output = tanh(x)
>>> print(output)
Tensor([-0.9640, -0.7616, 0.0000, 0.7616, 0.9640], device=cpu_numpy())