Initialization

This module provides functions for initializing tensors with different types of values. It includes functions for generating tensors filled with ones, zeros, random numbers, binary random numbers, and one-hot encoded tensors. It also provides functions for initializing weights using Xavier/Glorot uniform and Kaiming/He uniform and normal distributions.

Functions:

ones –

Generate a tensor filled with ones.
zeros –

Generate a tensor filled with zeros.
rand –

Generate a tensor filled with random numbers from a uniform distribution.
randb –

Generate a binary random tensor.
one_hot –

Generate a one-hot encoded tensor.
xavier_uniform –

Initialize weights using Xavier/Glorot uniform initialization.
xavier_normal –

Initialize weights using Xavier/Glorot normal initialization.
kaiming_uniform –

Initialize weights using Kaiming/He uniform initialization.
kaiming_normal –

Initialize weights using Kaiming/He normal initialization.

`constant(*shape, c=1.0, device=None, dtype='float32', requires_grad=False)`

Generate a tensor filled with a constant value.

Parameters:

*shape (int, default: () ) –

Shape of the output tensor.
c (float, default: 1.0 ) –

Constant value to fill the tensor with. Default is 1.0.
device (Device, default: None ) –

Device on which to place the tensor. Default is CPU.
dtype (str, default: 'float32' ) –

Data type of the tensor. Default is "float32".
requires_grad (bool, default: False ) –

If True, tensor will track gradients. Default is False.

Returns:

Tensor –

Tensor of specified shape filled with constant value c.

Examples:

>>> constant(2, 3, c=5)  # 2x3 tensor filled with 5
>>> constant(4, c=3.14)  # 4-element tensor filled with pi
>>> constant(2, 2, device=gpu(), c=2.0)  # 2x2 tensor on GPU filled with 2

`kaiming_normal(fan_in, fan_out, nonlinearity='relu', **kwargs)`

Initialize weights using Kaiming/He normal initialization.

Parameters:

fan_in (int) –

Number of input features.
fan_out (int) –

Number of output features.
nonlinearity (str, default: 'relu' ) –

The non-linear function (or activation function) used in the model. Default is "relu".
**kwargs –

Additional arguments passed to randn().

Returns:

Tensor –

Tensor initialized with values from normal distribution N(0, std^2) where std = sqrt(2 / fan_in).

Notes

This initialization is designed to work with the rectified linear unit (ReLU) activation function. It sets the weights to be zero-mean and have a standard deviation of sqrt(2 / fan_in).

References

He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In ICCV.

`kaiming_uniform(fan_in, fan_out, nonlinearity='relu', **kwargs)`

Initialize weights using Kaiming/He uniform initialization.

Parameters:

fan_in (int) –

Number of input features.
fan_out (int) –

Number of output features.
nonlinearity (str, default: 'relu' ) –

The non-linear function (or activation function) used in the model. Default is "relu".
**kwargs –

Additional arguments passed to rand().

Returns:

Tensor –

Tensor initialized with values from uniform distribution U(-bound, bound) where bound = sqrt(6 / fan_in).

Notes

This initialization is designed to work with the rectified linear unit (ReLU) activation function. It sets the weights to be zero-mean and have a standard deviation of sqrt(2 / fan_in).

References

He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In ICCV.

Examples:

>>> kaiming_uniform(10, 5)  # 10x5 tensor with Kaiming initialization
>>> kaiming_uniform(20, 10, nonlinearity="tanh")  # Initialize for tanh
>>> kaiming_uniform(5, 5, device=gpu())  # Initialize on GPU

`one_hot(k, n, device=None, dtype='float32', requires_grad=False)`

Generate a one-hot encoded tensor.

Parameters:

k (int) –

Number of classes (width of one-hot encoding).
n (int or Iterable[int]) –

Number of samples (rows) or shape of output tensor.
device (Device, default: None ) –

Device on which to place the tensor. Default is CPU.
dtype (str, default: 'float32' ) –

Data type of the tensor. Default is "float32".
requires_grad (bool, default: False ) –

If True, tensor will track gradients. Default is False.

Returns:

Tensor –

One-hot encoded tensor with shape (n, k) if n is int, or (*n, k) if n is Iterable.

Examples:

>>> one_hot(3, 2)  # 2x3 tensor with one-hot rows
>>> one_hot(4, [2,3])  # 2x3x4 tensor with one-hot encodings
>>> one_hot(2, 5, device=gpu())  # 5x2 one-hot tensor on GPU

`ones(*shape, device=None, dtype='float32', requires_grad=False)`

Generate a tensor filled with ones.

Parameters:

*shape (int, default: () ) –

Shape of the output tensor.
device (Device, default: None ) –

Device on which to place the tensor. Default is CPU.
dtype (str, default: 'float32' ) –

Data type of the tensor. Default is "float32".
requires_grad (bool, default: False ) –

If True, tensor will track gradients. Default is False.

Returns:

Tensor –

Tensor of specified shape filled with ones.

Examples:

>>> ones(2, 3)  # 2x3 tensor filled with ones
>>> ones(4, device=gpu())  # 4-element tensor of ones on GPU
>>> ones(2, 2, dtype="float64")  # 2x2 tensor of ones with float64 dtype

`rand(*shape, low=0.0, high=1.0, device=None, dtype='float32', requires_grad=False)`

Generate a tensor filled with random numbers from uniform distribution.

Parameters:

*shape (int, default: () ) –

Shape of the output tensor.
low (float, default: 0.0 ) –

Lower bound of the uniform distribution. Default is 0.0.
high (float, default: 1.0 ) –

Upper bound of the uniform distribution. Default is 1.0.
device (Device, default: None ) –

Device on which to place the tensor. Default is CPU.
dtype (str, default: 'float32' ) –

Data type of the tensor. Default is "float32".
requires_grad (bool, default: False ) –

If True, tensor will track gradients. Default is False.

Returns:

Tensor –

Tensor of specified shape filled with random values from U(low, high).

Examples:

>>> rand(2, 3)  # 2x3 tensor with values in [0,1]
>>> rand(4, low=-1, high=1)  # 4-element tensor with values in [-1,1]
>>> rand(2, 2, device=gpu(), dtype="float64")  # 2x2 tensor on GPU

`randb(*shape, p=0.5, device=None, dtype='bool', requires_grad=False)`

Generate a binary random tensor.

Parameters:

*shape (int, default: () ) –

Shape of the output tensor.
p (float, default: 0.5 ) –

Probability of generating 1. Default is 0.5.
device (Device, default: None ) –

Device on which to place the tensor. Default is CPU.
dtype (str, default: 'bool' ) –

Data type of the tensor. Default is "bool".
requires_grad (bool, default: False ) –

If True, tensor will track gradients. Default is False.

Returns:

Tensor –

Binary tensor of specified shape where each element is 1 with probability p and 0 with probability (1-p).

Examples:

>>> randb(2, 3)  # 2x3 binary tensor with p=0.5
>>> randb(4, p=0.8)  # 4-element tensor, 80% chance of 1s
>>> randb(2, 2, device=gpu())  # 2x2 binary tensor on GPU

`randn(*shape, mean=0.0, std=1.0, device=None, dtype='float32', requires_grad=False)`

Generate a tensor filled with random numbers from normal distribution.

Parameters:

*shape (int, default: () ) –

Shape of the output tensor.
mean (float, default: 0.0 ) –

Mean of the normal distribution. Default is 0.0.
std (float, default: 1.0 ) –

Standard deviation of the normal distribution. Default is 1.0.
device (Device, default: None ) –

Device on which to place the tensor. Default is CPU.
dtype (str, default: 'float32' ) –

Data type of the tensor. Default is "float32".
requires_grad (bool, default: False ) –

If True, tensor will track gradients. Default is False.

Returns:

Tensor –

Tensor of specified shape filled with random values from N(mean, std^2).

Examples:

>>> randn(2, 3)  # 2x3 tensor from standard normal
>>> randn(4, mean=5, std=0.1)  # 4-element tensor with mean 5, std 0.1
>>> randn(2, 2, device=gpu(), dtype="float64")  # 2x2 tensor on GPU

`xavier_normal(fan_in, fan_out, gain=1.0, **kwargs)`

Initialize weights using Xavier/Glorot normal initialization.

Parameters:

fan_in (int) –

Number of input features.
fan_out (int) –

Number of output features.
gain (float, default: 1.0 ) –

Scaling factor for the standard deviation of the normal distribution. Default is 1.0.
**kwargs –

Additional arguments passed to randn().

Returns:

Tensor –

Tensor initialized with values from normal distribution N(mean, std^2) where mean = 0 and std = gain * sqrt(2/(fan_in + fan_out)).

Notes

This initialization helps maintain variance of activations and gradients across layers in deep networks. The gain parameter can be adjusted for different activation functions.

References

Glorot, X. & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In AISTATS.

Examples:

>>> xavier_normal(10, 5)  # 10x5 tensor with Xavier initialization
>>> xavier_normal(20, 10, gain=2.0)  # Scaled initialization
>>> xavier_normal(5, 5, device=gpu())  # Initialize on GPU

`xavier_uniform(fan_in, fan_out, gain=1.0, **kwargs)`

Initialize weights using Xavier/Glorot uniform initialization.

Parameters:

fan_in (int) –

Number of input features.
fan_out (int) –

Number of output features.
gain (float, default: 1.0 ) –

Scaling factor for the bounds of the uniform distribution. Default is 1.0.
**kwargs –

Additional arguments passed to rand().

Returns:

Tensor –

Tensor initialized with values from uniform distribution U(-a, a) where a = gain * sqrt(6/(fan_in + fan_out)).

Notes

This initialization helps maintain variance of activations and gradients across layers in deep networks. The gain parameter can be adjusted for different activation functions.

References

Glorot, X. & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In AISTATS.

Examples:

>>> xavier_uniform(10, 5)  # 10x5 tensor with Xavier initialization
>>> xavier_uniform(20, 10, gain=2.0)  # Scaled initialization
>>> xavier_uniform(5, 5, device=gpu())  # Initialize on GPU

`zeros(*shape, device=None, dtype='float32', requires_grad=False)`

Generate a tensor filled with zeros.

Parameters:

*shape (int, default: () ) –

Shape of the output tensor.
device (Device, default: None ) –

Device on which to place the tensor. Default is CPU.
dtype (str, default: 'float32' ) –

Data type of the tensor. Default is "float32".
requires_grad (bool, default: False ) –

If True, tensor will track gradients. Default is False.

Returns:

Tensor –

Tensor of specified shape filled with zeros.

Examples:

>>> zeros(2, 3)  # 2x3 tensor filled with zeros
>>> zeros(4, device=gpu())  # 4-element tensor of zeros on GPU
>>> zeros(2, 2, dtype="float64")  # 2x2 tensor of zeros with float64 dtype