Initialization
Functions for creating and initializing tensors. Includes standard fills (zeros, ones, rand) and weight initialization schemes (Xavier, Kaiming) used when constructing neural network layers.
import tiny_pytorch.init as init
w = init.kaiming_uniform(128, 64) # (128, 64) tensor
b = init.zeros(64) # (64,) tensor
x = init.randn(32, 128) # (32, 128) tensor from N(0, 1)
This module provides functions for initializing tensors with different types of values. It includes functions for generating tensors filled with ones, zeros, random numbers, binary random numbers, and one-hot encoded tensors. It also provides functions for initializing weights using Xavier/Glorot uniform and Kaiming/He uniform and normal distributions.
Functions:
-
ones–Generate a tensor filled with ones.
-
zeros–Generate a tensor filled with zeros.
-
rand–Generate a tensor filled with random numbers from a uniform distribution.
-
randb–Generate a binary random tensor.
-
one_hot–Generate a one-hot encoded tensor.
-
zeros_like–Generate a tensor of zeros with the same shape and dtype as the input tensor.
-
ones_like–Generate a tensor of ones with the same shape and dtype as the input tensor.
-
xavier_uniform–Initialize weights using Xavier/Glorot uniform initialization.
-
xavier_normal–Initialize weights using Xavier/Glorot normal initialization.
-
kaiming_uniform–Initialize weights using Kaiming/He uniform initialization.
-
kaiming_normal–Initialize weights using Kaiming/He normal initialization.
constant(*shape, c=1.0, device=None, dtype='float32', requires_grad=False)
Generate a tensor filled with a constant value.
Parameters:
-
*shape(int, default:()) –Shape of the output tensor.
-
c(float, default:1.0) –Constant value to fill the tensor with. Default is 1.0.
-
device(Device, default:None) –Device on which to place the tensor. Default is CPU.
-
dtype(str, default:'float32') –Data type of the tensor. Default is "float32".
-
requires_grad(bool, default:False) –If True, tensor will track gradients. Default is False.
Returns:
-
Tensor–Tensor of specified shape filled with constant value c.
Examples:
>>> constant(2, 3, c=5) # 2x3 tensor filled with 5
>>> constant(4, c=3.14) # 4-element tensor filled with pi
>>> constant(2, 2, device=gpu(), c=2.0) # 2x2 tensor on GPU filled with 2
kaiming_normal(fan_in, fan_out, nonlinearity='relu', **kwargs)
Initialize weights using Kaiming/He normal initialization.
Parameters:
-
fan_in(int) –Number of input features.
-
fan_out(int) –Number of output features.
-
nonlinearity(str, default:'relu') –The non-linear function (or activation function) used in the model. Default is "relu".
-
**kwargs–Additional arguments passed to randn().
Returns:
-
Tensor–Tensor initialized with values from normal distribution N(0, std^2) where std = sqrt(2 / fan_in).
Notes
This initialization is designed to work with the rectified linear unit (ReLU) activation function. It sets the weights to be zero-mean and have a standard deviation of sqrt(2 / fan_in).
References
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In ICCV.
kaiming_uniform(fan_in, fan_out, nonlinearity='relu', **kwargs)
Initialize weights using Kaiming/He uniform initialization.
Parameters:
-
fan_in(int) –Number of input features.
-
fan_out(int) –Number of output features.
-
nonlinearity(str, default:'relu') –The non-linear function (or activation function) used in the model. Default is "relu".
-
**kwargs–Additional arguments passed to rand().
Returns:
-
Tensor–Tensor initialized with values from uniform distribution U(-bound, bound) where bound = sqrt(6 / fan_in).
Notes
This initialization is designed to work with the rectified linear unit (ReLU) activation function. It sets the weights to be zero-mean and have a standard deviation of sqrt(2 / fan_in).
References
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In ICCV.
Examples:
>>> kaiming_uniform(10, 5) # 10x5 tensor with Kaiming initialization
>>> kaiming_uniform(20, 10, nonlinearity="tanh") # Initialize for tanh
>>> kaiming_uniform(5, 5, device=gpu()) # Initialize on GPU
one_hot(k, n, device=None, dtype='float32', requires_grad=False)
Generate a one-hot encoded tensor.
Parameters:
-
k(int) –Number of classes (width of one-hot encoding).
-
n(int or Iterable[int]) –Number of samples (rows) or shape of output tensor.
-
device(Device, default:None) –Device on which to place the tensor. Default is CPU.
-
dtype(str, default:'float32') –Data type of the tensor. Default is "float32".
-
requires_grad(bool, default:False) –If True, tensor will track gradients. Default is False.
Returns:
-
Tensor–One-hot encoded tensor with shape (n, k) if n is int, or (*n, k) if n is Iterable.
Examples:
>>> one_hot(3, 2) # 2x3 tensor with one-hot rows
>>> one_hot(4, [2,3]) # 2x3x4 tensor with one-hot encodings
>>> one_hot(2, 5, device=gpu()) # 5x2 one-hot tensor on GPU
ones(*shape, device=None, dtype='float32', requires_grad=False)
Generate a tensor filled with ones.
Parameters:
-
*shape(int, default:()) –Shape of the output tensor.
-
device(Device, default:None) –Device on which to place the tensor. Default is CPU.
-
dtype(str, default:'float32') –Data type of the tensor. Default is "float32".
-
requires_grad(bool, default:False) –If True, tensor will track gradients. Default is False.
Returns:
-
Tensor–Tensor of specified shape filled with ones.
Examples:
>>> ones(2, 3) # 2x3 tensor filled with ones
>>> ones(4, device=gpu()) # 4-element tensor of ones on GPU
>>> ones(2, 2, dtype="float64") # 2x2 tensor of ones with float64 dtype
ones_like(array, *, device=None, requires_grad=False)
Return a tensor of ones with the same shape and dtype as the input tensor.
Parameters:
-
array(Tensor) –Input tensor whose shape and dtype will be used.
-
device(Device, default:None) –Device on which to place the tensor. If None, uses the device of the input tensor.
-
requires_grad(bool, default:False) –If True, tensor will track gradients. Default is False.
Returns:
-
Tensor–Tensor of ones with the same shape and dtype as the input tensor.
Examples:
>>> x = Tensor([[1, 2], [3, 4]])
>>> ones_like(x)
Tensor([[1, 1], [1, 1]])
rand(*shape, low=0.0, high=1.0, device=None, dtype='float32', requires_grad=False)
Generate a tensor filled with random numbers from uniform distribution.
Parameters:
-
*shape(int, default:()) –Shape of the output tensor.
-
low(float, default:0.0) –Lower bound of the uniform distribution. Default is 0.0.
-
high(float, default:1.0) –Upper bound of the uniform distribution. Default is 1.0.
-
device(Device, default:None) –Device on which to place the tensor. Default is CPU.
-
dtype(str, default:'float32') –Data type of the tensor. Default is "float32".
-
requires_grad(bool, default:False) –If True, tensor will track gradients. Default is False.
Returns:
-
Tensor–Tensor of specified shape filled with random values from U(low, high).
Examples:
>>> rand(2, 3) # 2x3 tensor with values in [0,1]
>>> rand(4, low=-1, high=1) # 4-element tensor with values in [-1,1]
>>> rand(2, 2, device=gpu(), dtype="float64") # 2x2 tensor on GPU
randb(*shape, p=0.5, device=None, dtype='bool', requires_grad=False)
Generate a binary random tensor.
Parameters:
-
*shape(int, default:()) –Shape of the output tensor.
-
p(float, default:0.5) –Probability of generating 1. Default is 0.5.
-
device(Device, default:None) –Device on which to place the tensor. Default is CPU.
-
dtype(str, default:'bool') –Data type of the tensor. Default is "bool".
-
requires_grad(bool, default:False) –If True, tensor will track gradients. Default is False.
Returns:
-
Tensor–Binary tensor of specified shape where each element is 1 with probability p and 0 with probability (1-p).
Examples:
>>> randb(2, 3) # 2x3 binary tensor with p=0.5
>>> randb(4, p=0.8) # 4-element tensor, 80% chance of 1s
>>> randb(2, 2, device=gpu()) # 2x2 binary tensor on GPU
randn(*shape, mean=0.0, std=1.0, device=None, dtype='float32', requires_grad=False)
Generate a tensor filled with random numbers from normal distribution.
Parameters:
-
*shape(int, default:()) –Shape of the output tensor.
-
mean(float, default:0.0) –Mean of the normal distribution. Default is 0.0.
-
std(float, default:1.0) –Standard deviation of the normal distribution. Default is 1.0.
-
device(Device, default:None) –Device on which to place the tensor. Default is CPU.
-
dtype(str, default:'float32') –Data type of the tensor. Default is "float32".
-
requires_grad(bool, default:False) –If True, tensor will track gradients. Default is False.
Returns:
-
Tensor–Tensor of specified shape filled with random values from N(mean, std^2).
Examples:
>>> randn(2, 3) # 2x3 tensor from standard normal
>>> randn(4, mean=5, std=0.1) # 4-element tensor with mean 5, std 0.1
>>> randn(2, 2, device=gpu(), dtype="float64") # 2x2 tensor on GPU
xavier_normal(fan_in, fan_out, gain=1.0, **kwargs)
Initialize weights using Xavier/Glorot normal initialization.
Parameters:
-
fan_in(int) –Number of input features.
-
fan_out(int) –Number of output features.
-
gain(float, default:1.0) –Scaling factor for the standard deviation of the normal distribution. Default is 1.0.
-
**kwargs–Additional arguments passed to randn().
Returns:
-
Tensor–Tensor initialized with values from normal distribution N(mean, std^2) where mean = 0 and std = gain * sqrt(2/(fan_in + fan_out)).
Notes
This initialization helps maintain variance of activations and gradients across layers in deep networks. The gain parameter can be adjusted for different activation functions.
References
Glorot, X. & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In AISTATS.
Examples:
>>> xavier_normal(10, 5) # 10x5 tensor with Xavier initialization
>>> xavier_normal(20, 10, gain=2.0) # Scaled initialization
>>> xavier_normal(5, 5, device=gpu()) # Initialize on GPU
xavier_uniform(fan_in, fan_out, gain=1.0, **kwargs)
Initialize weights using Xavier/Glorot uniform initialization.
Parameters:
-
fan_in(int) –Number of input features.
-
fan_out(int) –Number of output features.
-
gain(float, default:1.0) –Scaling factor for the bounds of the uniform distribution. Default is 1.0.
-
**kwargs–Additional arguments passed to rand().
Returns:
-
Tensor–Tensor initialized with values from uniform distribution U(-a, a) where a = gain * sqrt(6/(fan_in + fan_out)).
Notes
This initialization helps maintain variance of activations and gradients across layers in deep networks. The gain parameter can be adjusted for different activation functions.
References
Glorot, X. & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In AISTATS.
Examples:
>>> xavier_uniform(10, 5) # 10x5 tensor with Xavier initialization
>>> xavier_uniform(20, 10, gain=2.0) # Scaled initialization
>>> xavier_uniform(5, 5, device=gpu()) # Initialize on GPU
zeros(*shape, device=None, dtype='float32', requires_grad=False)
Generate a tensor filled with zeros.
Parameters:
-
*shape(int, default:()) –Shape of the output tensor.
-
device(Device, default:None) –Device on which to place the tensor. Default is CPU.
-
dtype(str, default:'float32') –Data type of the tensor. Default is "float32".
-
requires_grad(bool, default:False) –If True, tensor will track gradients. Default is False.
Returns:
-
Tensor–Tensor of specified shape filled with zeros.
Examples:
>>> zeros(2, 3) # 2x3 tensor filled with zeros
>>> zeros(4, device=gpu()) # 4-element tensor of zeros on GPU
>>> zeros(2, 2, dtype="float64") # 2x2 tensor of zeros with float64 dtype
zeros_like(array, *, device=None, requires_grad=False)
Return a tensor of zeros with the same shape and dtype as the input tensor.
Parameters:
-
array(Tensor) –Input tensor whose shape and dtype will be used.
-
device(Device, default:None) –Device on which to place the tensor. If None, uses the device of the input tensor.
-
requires_grad(bool, default:False) –If True, tensor will track gradients. Default is False.
Returns:
-
Tensor–Tensor of zeros with the same shape and dtype as the input tensor.
Examples:
>>> x = Tensor([[1, 2], [3, 4]])
>>> zeros_like(x)
Tensor([[0, 0], [0, 0]])