Initialization
This module provides functions for initializing tensors with different types of values. It includes functions for generating tensors filled with ones, zeros, random numbers, binary random numbers, and one-hot encoded tensors. It also provides functions for initializing weights using Xavier/Glorot uniform and Kaiming/He uniform and normal distributions.
Functions:
-
ones
–Generate a tensor filled with ones.
-
zeros
–Generate a tensor filled with zeros.
-
rand
–Generate a tensor filled with random numbers from a uniform distribution.
-
randb
–Generate a binary random tensor.
-
one_hot
–Generate a one-hot encoded tensor.
-
xavier_uniform
–Initialize weights using Xavier/Glorot uniform initialization.
-
xavier_normal
–Initialize weights using Xavier/Glorot normal initialization.
-
kaiming_uniform
–Initialize weights using Kaiming/He uniform initialization.
-
kaiming_normal
–Initialize weights using Kaiming/He normal initialization.
constant(*shape, c=1.0, device=None, dtype='float32', requires_grad=False)
Generate a tensor filled with a constant value.
Parameters:
-
*shape
(int
, default:()
) –Shape of the output tensor.
-
c
(float
, default:1.0
) –Constant value to fill the tensor with. Default is 1.0.
-
device
(Device
, default:None
) –Device on which to place the tensor. Default is CPU.
-
dtype
(str
, default:'float32'
) –Data type of the tensor. Default is "float32".
-
requires_grad
(bool
, default:False
) –If True, tensor will track gradients. Default is False.
Returns:
-
Tensor
–Tensor of specified shape filled with constant value c.
Examples:
>>> constant(2, 3, c=5) # 2x3 tensor filled with 5
>>> constant(4, c=3.14) # 4-element tensor filled with pi
>>> constant(2, 2, device=gpu(), c=2.0) # 2x2 tensor on GPU filled with 2
kaiming_normal(fan_in, fan_out, nonlinearity='relu', **kwargs)
Initialize weights using Kaiming/He normal initialization.
Parameters:
-
fan_in
(int
) –Number of input features.
-
fan_out
(int
) –Number of output features.
-
nonlinearity
(str
, default:'relu'
) –The non-linear function (or activation function) used in the model. Default is "relu".
-
**kwargs
–Additional arguments passed to randn().
Returns:
-
Tensor
–Tensor initialized with values from normal distribution N(0, std^2) where std = sqrt(2 / fan_in).
Notes
This initialization is designed to work with the rectified linear unit (ReLU) activation function. It sets the weights to be zero-mean and have a standard deviation of sqrt(2 / fan_in).
References
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In ICCV.
kaiming_uniform(fan_in, fan_out, nonlinearity='relu', **kwargs)
Initialize weights using Kaiming/He uniform initialization.
Parameters:
-
fan_in
(int
) –Number of input features.
-
fan_out
(int
) –Number of output features.
-
nonlinearity
(str
, default:'relu'
) –The non-linear function (or activation function) used in the model. Default is "relu".
-
**kwargs
–Additional arguments passed to rand().
Returns:
-
Tensor
–Tensor initialized with values from uniform distribution U(-bound, bound) where bound = sqrt(6 / fan_in).
Notes
This initialization is designed to work with the rectified linear unit (ReLU) activation function. It sets the weights to be zero-mean and have a standard deviation of sqrt(2 / fan_in).
References
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In ICCV.
Examples:
>>> kaiming_uniform(10, 5) # 10x5 tensor with Kaiming initialization
>>> kaiming_uniform(20, 10, nonlinearity="tanh") # Initialize for tanh
>>> kaiming_uniform(5, 5, device=gpu()) # Initialize on GPU
one_hot(k, n, device=None, dtype='float32', requires_grad=False)
Generate a one-hot encoded tensor.
Parameters:
-
k
(int
) –Number of classes (width of one-hot encoding).
-
n
(int or Iterable[int]
) –Number of samples (rows) or shape of output tensor.
-
device
(Device
, default:None
) –Device on which to place the tensor. Default is CPU.
-
dtype
(str
, default:'float32'
) –Data type of the tensor. Default is "float32".
-
requires_grad
(bool
, default:False
) –If True, tensor will track gradients. Default is False.
Returns:
-
Tensor
–One-hot encoded tensor with shape (n, k) if n is int, or (*n, k) if n is Iterable.
Examples:
>>> one_hot(3, 2) # 2x3 tensor with one-hot rows
>>> one_hot(4, [2,3]) # 2x3x4 tensor with one-hot encodings
>>> one_hot(2, 5, device=gpu()) # 5x2 one-hot tensor on GPU
ones(*shape, device=None, dtype='float32', requires_grad=False)
Generate a tensor filled with ones.
Parameters:
-
*shape
(int
, default:()
) –Shape of the output tensor.
-
device
(Device
, default:None
) –Device on which to place the tensor. Default is CPU.
-
dtype
(str
, default:'float32'
) –Data type of the tensor. Default is "float32".
-
requires_grad
(bool
, default:False
) –If True, tensor will track gradients. Default is False.
Returns:
-
Tensor
–Tensor of specified shape filled with ones.
Examples:
>>> ones(2, 3) # 2x3 tensor filled with ones
>>> ones(4, device=gpu()) # 4-element tensor of ones on GPU
>>> ones(2, 2, dtype="float64") # 2x2 tensor of ones with float64 dtype
rand(*shape, low=0.0, high=1.0, device=None, dtype='float32', requires_grad=False)
Generate a tensor filled with random numbers from uniform distribution.
Parameters:
-
*shape
(int
, default:()
) –Shape of the output tensor.
-
low
(float
, default:0.0
) –Lower bound of the uniform distribution. Default is 0.0.
-
high
(float
, default:1.0
) –Upper bound of the uniform distribution. Default is 1.0.
-
device
(Device
, default:None
) –Device on which to place the tensor. Default is CPU.
-
dtype
(str
, default:'float32'
) –Data type of the tensor. Default is "float32".
-
requires_grad
(bool
, default:False
) –If True, tensor will track gradients. Default is False.
Returns:
-
Tensor
–Tensor of specified shape filled with random values from U(low, high).
Examples:
>>> rand(2, 3) # 2x3 tensor with values in [0,1]
>>> rand(4, low=-1, high=1) # 4-element tensor with values in [-1,1]
>>> rand(2, 2, device=gpu(), dtype="float64") # 2x2 tensor on GPU
randb(*shape, p=0.5, device=None, dtype='bool', requires_grad=False)
Generate a binary random tensor.
Parameters:
-
*shape
(int
, default:()
) –Shape of the output tensor.
-
p
(float
, default:0.5
) –Probability of generating 1. Default is 0.5.
-
device
(Device
, default:None
) –Device on which to place the tensor. Default is CPU.
-
dtype
(str
, default:'bool'
) –Data type of the tensor. Default is "bool".
-
requires_grad
(bool
, default:False
) –If True, tensor will track gradients. Default is False.
Returns:
-
Tensor
–Binary tensor of specified shape where each element is 1 with probability p and 0 with probability (1-p).
Examples:
>>> randb(2, 3) # 2x3 binary tensor with p=0.5
>>> randb(4, p=0.8) # 4-element tensor, 80% chance of 1s
>>> randb(2, 2, device=gpu()) # 2x2 binary tensor on GPU
randn(*shape, mean=0.0, std=1.0, device=None, dtype='float32', requires_grad=False)
Generate a tensor filled with random numbers from normal distribution.
Parameters:
-
*shape
(int
, default:()
) –Shape of the output tensor.
-
mean
(float
, default:0.0
) –Mean of the normal distribution. Default is 0.0.
-
std
(float
, default:1.0
) –Standard deviation of the normal distribution. Default is 1.0.
-
device
(Device
, default:None
) –Device on which to place the tensor. Default is CPU.
-
dtype
(str
, default:'float32'
) –Data type of the tensor. Default is "float32".
-
requires_grad
(bool
, default:False
) –If True, tensor will track gradients. Default is False.
Returns:
-
Tensor
–Tensor of specified shape filled with random values from N(mean, std^2).
Examples:
>>> randn(2, 3) # 2x3 tensor from standard normal
>>> randn(4, mean=5, std=0.1) # 4-element tensor with mean 5, std 0.1
>>> randn(2, 2, device=gpu(), dtype="float64") # 2x2 tensor on GPU
xavier_normal(fan_in, fan_out, gain=1.0, **kwargs)
Initialize weights using Xavier/Glorot normal initialization.
Parameters:
-
fan_in
(int
) –Number of input features.
-
fan_out
(int
) –Number of output features.
-
gain
(float
, default:1.0
) –Scaling factor for the standard deviation of the normal distribution. Default is 1.0.
-
**kwargs
–Additional arguments passed to randn().
Returns:
-
Tensor
–Tensor initialized with values from normal distribution N(mean, std^2) where mean = 0 and std = gain * sqrt(2/(fan_in + fan_out)).
Notes
This initialization helps maintain variance of activations and gradients across layers in deep networks. The gain parameter can be adjusted for different activation functions.
References
Glorot, X. & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In AISTATS.
Examples:
>>> xavier_normal(10, 5) # 10x5 tensor with Xavier initialization
>>> xavier_normal(20, 10, gain=2.0) # Scaled initialization
>>> xavier_normal(5, 5, device=gpu()) # Initialize on GPU
xavier_uniform(fan_in, fan_out, gain=1.0, **kwargs)
Initialize weights using Xavier/Glorot uniform initialization.
Parameters:
-
fan_in
(int
) –Number of input features.
-
fan_out
(int
) –Number of output features.
-
gain
(float
, default:1.0
) –Scaling factor for the bounds of the uniform distribution. Default is 1.0.
-
**kwargs
–Additional arguments passed to rand().
Returns:
-
Tensor
–Tensor initialized with values from uniform distribution U(-a, a) where a = gain * sqrt(6/(fan_in + fan_out)).
Notes
This initialization helps maintain variance of activations and gradients across layers in deep networks. The gain parameter can be adjusted for different activation functions.
References
Glorot, X. & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In AISTATS.
Examples:
>>> xavier_uniform(10, 5) # 10x5 tensor with Xavier initialization
>>> xavier_uniform(20, 10, gain=2.0) # Scaled initialization
>>> xavier_uniform(5, 5, device=gpu()) # Initialize on GPU
zeros(*shape, device=None, dtype='float32', requires_grad=False)
Generate a tensor filled with zeros.
Parameters:
-
*shape
(int
, default:()
) –Shape of the output tensor.
-
device
(Device
, default:None
) –Device on which to place the tensor. Default is CPU.
-
dtype
(str
, default:'float32'
) –Data type of the tensor. Default is "float32".
-
requires_grad
(bool
, default:False
) –If True, tensor will track gradients. Default is False.
Returns:
-
Tensor
–Tensor of specified shape filled with zeros.
Examples:
>>> zeros(2, 3) # 2x3 tensor filled with zeros
>>> zeros(4, device=gpu()) # 4-element tensor of zeros on GPU
>>> zeros(2, 2, dtype="float64") # 2x2 tensor of zeros with float64 dtype