Skip to content

Tensor

Core data structures for multi-dimensional tensors.

This module provides the fundamental Tensor class and related components that form the backbone of the tiny-pytorch framework. It implements automatic differentiation, computation graph management, and tensor operations with support for multiple backends and devices.

The module includes the core Tensor class, operation abstractions, and gradient computation utilities that enable building and training neural networks with automatic differentiation capabilities.

Key Features
  • Automatic differentiation with gradient tracking
  • Computation graph construction and management
  • Support for multiple backends (NumPy, CPU, CUDA)
  • Lazy evaluation mode for memory efficiency
  • Tensor operations with automatic broadcasting
  • Gradient computation and backpropagation
  • Device and dtype management

Classes:

  • Op

    Base class for all tensor operations. Defines the interface for operations that can be applied to tensors to create new tensors in the computation graph.

  • TensorOp : Op

    Base class for operations that produce single tensors.

  • TensorTupleOp : Op

    Base class for operations that produce tuples of tensors.

  • Tensor

    Multi-dimensional tensor with automatic differentiation support. The core data structure for representing inputs, outputs, and intermediate results in neural network computations.

  • TensorTuple : Tensor

    Specialized tensor class for representing tuples of tensors.

Functions:

  • compute_gradients

    Compute gradients for all tensors in the computation graph.

  • find_topo_sort

    Find topological sort of tensors in the computation graph.

  • _topo_sort_dfs

    Depth-first search for topological sorting.

Notes

The Tensor system implements automatic differentiation through a computation graph where each tensor operation creates a new tensor node that tracks its inputs and the operation that produced it. When backward() is called on a tensor, gradients are computed and propagated through the graph using the chain rule.

The system supports both eager and lazy evaluation modes. In eager mode (default), tensor values are computed immediately. In lazy mode, computation is deferred until the tensor value is actually needed.

All tensor operations are designed to work seamlessly with the automatic differentiation system, automatically tracking gradients when requires_grad=True.

Examples:

>>> import tiny_pytorch as tp
>>>
>>> # Create tensors
>>> x = tp.Tensor([1, 2, 3], requires_grad=True)
>>> y = tp.Tensor([4, 5, 6], requires_grad=True)
>>>
>>> # Perform operations
>>> z = x * y + 2  # Automatic gradient tracking
>>> loss = z.sum()
>>>
>>> # Compute gradients
>>> loss.backward()
>>> print(x.grad)  # Gradient with respect to x
>>> print(y.grad)  # Gradient with respect to y
>>>
>>> # Use different devices
>>> x_cpu = tp.Tensor([1, 2, 3], device=tp.cpu())
>>> x_cuda = tp.Tensor([1, 2, 3], device=tp.cuda())

Op

Base class for all tensor operations.

This class defines the interface that all tensor operations must implement. Operations are callable objects that can be applied to tensors to create new tensors in the computation graph.

Methods:

  • __call__

    Apply the operation to the given arguments.

  • compute

    Compute the actual operation on the underlying arrays.

  • gradient

    Compute the gradient of the operation.

__call__(*args)

Apply the operation to the given arguments.

Parameters:

  • *args (Tensor, default: () ) –

    Input tensors to the operation.

Returns:

  • Tensor

    Result of applying the operation to the inputs.

compute(*args)

Compute the actual operation on the underlying arrays.

Parameters:

  • *args (tuple[NDArray], default: () ) –

    Input arrays to the operation.

Returns:

  • NDArray

    Result of the operation.

Raises:

  • NotImplementedError

    This method must be implemented by subclasses.

gradient(out_grad, out_node)

Compute the gradient of the operation.

Parameters:

  • out_grad (Tensor) –

    Gradient of the output with respect to the final result.

  • out_node (Tensor) –

    The output tensor of this operation.

Returns:

  • Tensor or tuple[Tensor]

    Gradient(s) with respect to the input(s) of this operation.

Raises:

  • NotImplementedError

    This method must be implemented by subclasses.

gradient_as_tuple(out_grad, node)

Convenience method to always return a tuple from gradient call

Tensor

Tensor is the fundamental data structure in tiny_pytorch. It is a multi-dimensional array of numerical values used to represent inputs, outputs, and intermediate results in a computation graph.

Attributes:

  • cached_data (list[object]) –

    The cached data of the tensor.

  • inputs (list[Tensor]) –

    The input tensors to the operation that produced this tensor.

  • op (Op) –

    The operation that produced this tensor.

  • requires_grad (bool) –

    If True, the tensor will track gradients.

data property writable

Returns a detached Tensor with the original data.

device property

Returns the device on which the tensor is stored.

Returns:

  • device ( Device ) –

    The device on which the tensor is stored.

dtype property

Returns the data type of the tensor.

Returns:

  • dtype ( dtype ) –

    The data type of the tensor.

ndim property

Returns the number of dimensions of the tensor.

Returns:

  • int

    Number of dimensions of the tensor.

shape property

Returns the shape of the tensor as a tuple.

Returns:

  • tuple

    Shape of the tensor.

__add__(other)

Add another tensor or scalar to this tensor.

Parameters:

  • other (Tensor or scalar) –

    The tensor or scalar to add.

Returns:

  • Tensor

    Result of the addition operation.

__init__(array, *, device=None, dtype=None, requires_grad=True)

Construct a Tensor by copying array.

Parameters:

  • array (object) –

    The array to be copied.

  • device (Device, default: None ) –

    The device on which to place the tensor. Default is None.

  • dtype (str, default: None ) –

    The data type of the tensor. Default is None.

  • requires_grad (bool, default: True ) –

    If True, the tensor will track gradients. Default is True.

__matmul__(other)

Matrix multiplication with another tensor.

Parameters:

  • other (Tensor) –

    The tensor to multiply with.

Returns:

  • Tensor

    Result of the matrix multiplication.

__mul__(other)

Multiply this tensor by another tensor or scalar.

Parameters:

  • other (Tensor or scalar) –

    The tensor or scalar to multiply by.

Returns:

  • Tensor

    Result of the multiplication operation.

__neg__()

Negate this tensor.

Returns:

__pow__(other)

Raise this tensor to the power of another tensor or scalar.

Parameters:

  • other (Tensor or scalar) –

    The exponent.

Returns:

  • Tensor

    Result of the power operation.

__repr__()

String representation of the tensor.

Returns:

  • str

    String representation showing the tensor data.

__str__()

String representation of the tensor.

Returns:

  • str

    String representation of the tensor data.

__sub__(other)

Subtract another tensor or scalar from this tensor.

Parameters:

  • other (Tensor or scalar) –

    The tensor or scalar to subtract.

Returns:

  • Tensor

    Result of the subtraction operation.

__truediv__(other)

Divide this tensor by another tensor or scalar.

Parameters:

  • other (Tensor or scalar) –

    The tensor or scalar to divide by.

Returns:

  • Tensor

    Result of the division operation.

backward(out_grad=None)

Computes the gradients of the tensor with respect to the output gradient.

Parameters:

  • out_grad (Tensor, default: None ) –

    The gradient of the output with respect to which the gradients are computed. If None, a tensor of ones is used.

Returns:

  • None

    This method updates the grad attribute of the tensor and its dependencies with the computed gradients.

broadcast_to(shape)

Broadcasts the tensor to the specified shape.

Parameters:

  • shape (tuple of ints) –

    The new shape of the tensor.

Returns:

  • Tensor

    A new tensor with the specified shape.

detach()

Returns a new Tensor with no history (detached from the computation graph). The returned Tensor will share the same data with the original one.

from_constant(data, requires_grad=False) classmethod

Creates a leaf node Tensor from the given data.

from_operation(op, inputs) classmethod

Creates a node Tensor by applying the op operation on the inputs Tensors.

is_leaf()

All Tensors that have requires_grad set to False OR they were created by the user and were not the result of an operation are considered leaf Tensors.

numpy()

Returns Tensor as Numpy ndarray. The underlying data will be shared between Tensor and the Numpy ndarray.

realize_cached_data()

Run computation to get the output if the LAZY MODE is on, else return cached data.

reshape(shape)

Reshapes the tensor to the specified shape.

Parameters:

  • shape (tuple of ints) –

    The new shape of the tensor.

Returns:

  • Tensor

    A new tensor with the specified shape.

sum(axes=None)

Returns the sum of elements over specified axes.

Parameters:

  • axes (None or int or tuple of ints, default: None ) –

    Axis or axes along which a sum is performed. The default is to sum all of the elements of the input tensor.

Returns:

  • Tensor

    A new tensor with the sum of elements over specified axes.

transpose(axes=None)

Transposes the tensor according to the specified axes.

Parameters:

  • axes (tuple of ints, default: None ) –

    By default, reverse the dimensions, otherwise permute the axes according to the values given.

Returns:

  • Tensor

    A new tensor with the specified axes transposed.

TensorOp

Bases: Op

Op class specialized to output tensors, will be alterate subclasses for other structures

TensorTuple

Bases: Tensor

Represent a tuple of tensors.

To keep things simple, we do not support nested tuples.

detach()

Create a new tensor that shares the data but detaches from the graph.

TensorTupleOp

Bases: Op

Op class specialized to output TensorTuple

compute_gradients(out_tensor, out_grad)

Compute gradients for all nodes in the computation graph.

This function implements reverse-mode automatic differentiation by traversing the computation graph in reverse topological order and computing gradients for each node.

Parameters:

  • out_tensor (Tensor) –

    The output tensor for which gradients are computed.

  • out_grad (Tensor) –

    The gradient of the output with respect to the final result.

Notes

This function modifies the grad attribute of tensors in the computation graph. It stores the computed result in the grad field of each tensor.

find_topo_sort(node_list)

Find topological sort of nodes in the computation graph.

Given a list of nodes, return a topological sort list of nodes ending in them. A simple algorithm is to do a post-order DFS traversal on the given nodes, going backwards based on input edges. Since a node is added to the ordering after all its predecessors are traversed due to post-order DFS, we get a topological sort.

Parameters:

  • node_list (list[Tensor]) –

    List of tensors to sort topologically.

Returns:

  • list[Tensor]

    Topologically sorted list of tensors.