Tensor
Core data structures for multi-dimensional tensors.
This module provides the fundamental Tensor class and related components that form the backbone of the tiny-pytorch framework. It implements automatic differentiation, computation graph management, and tensor operations with support for multiple backends and devices.
The module includes the core Tensor class, operation abstractions, and gradient computation utilities that enable building and training neural networks with automatic differentiation capabilities.
Key Features
- Automatic differentiation with gradient tracking
- Computation graph construction and management
- Support for multiple backends (NumPy, CPU, CUDA)
- Lazy evaluation mode for memory efficiency
- Tensor operations with automatic broadcasting
- Gradient computation and backpropagation
- Device and dtype management
Classes:
-
Op–Base class for all tensor operations. Defines the interface for operations that can be applied to tensors to create new tensors in the computation graph.
-
TensorOp : Op–Base class for operations that produce single tensors.
-
TensorTupleOp : Op–Base class for operations that produce tuples of tensors.
-
Tensor–Multi-dimensional tensor with automatic differentiation support. The core data structure for representing inputs, outputs, and intermediate results in neural network computations.
-
TensorTuple : Tensor–Specialized tensor class for representing tuples of tensors.
Functions:
-
compute_gradients–Compute gradients for all tensors in the computation graph.
-
find_topo_sort–Find topological sort of tensors in the computation graph.
-
_topo_sort_dfs–Depth-first search for topological sorting.
Notes
The Tensor system implements automatic differentiation through a computation graph where each tensor operation creates a new tensor node that tracks its inputs and the operation that produced it. When backward() is called on a tensor, gradients are computed and propagated through the graph using the chain rule.
The system supports both eager and lazy evaluation modes. In eager mode (default), tensor values are computed immediately. In lazy mode, computation is deferred until the tensor value is actually needed.
All tensor operations are designed to work seamlessly with the automatic differentiation system, automatically tracking gradients when requires_grad=True.
Examples:
>>> import tiny_pytorch as tp
>>>
>>> # Create tensors
>>> x = tp.Tensor([1, 2, 3], requires_grad=True)
>>> y = tp.Tensor([4, 5, 6], requires_grad=True)
>>>
>>> # Perform operations
>>> z = x * y + 2 # Automatic gradient tracking
>>> loss = z.sum()
>>>
>>> # Compute gradients
>>> loss.backward()
>>> print(x.grad) # Gradient with respect to x
>>> print(y.grad) # Gradient with respect to y
>>>
>>> # Use different devices
>>> x_cpu = tp.Tensor([1, 2, 3], device=tp.cpu())
>>> x_cuda = tp.Tensor([1, 2, 3], device=tp.cuda())
Op
Base class for all tensor operations.
This class defines the interface that all tensor operations must implement. Operations are callable objects that can be applied to tensors to create new tensors in the computation graph.
Methods:
-
__call__–Apply the operation to the given arguments.
-
compute–Compute the actual operation on the underlying arrays.
-
gradient–Compute the gradient of the operation.
__call__(*args)
compute(*args)
Compute the actual operation on the underlying arrays.
Parameters:
-
*args(tuple[NDArray], default:()) –Input arrays to the operation.
Returns:
-
NDArray–Result of the operation.
Raises:
-
NotImplementedError–This method must be implemented by subclasses.
gradient(out_grad, out_node)
gradient_as_tuple(out_grad, node)
Convenience method to always return a tuple from gradient call
Tensor
Tensor is the fundamental data structure in tiny_pytorch. It is a multi-dimensional array of numerical values used to represent inputs, outputs, and intermediate results in a computation graph.
Attributes:
-
cached_data(list[object]) –The cached data of the tensor.
-
inputs(list[Tensor]) –The input tensors to the operation that produced this tensor.
-
op(Op) –The operation that produced this tensor.
-
requires_grad(bool) –If True, the tensor will track gradients.
data
property
writable
Returns a detached Tensor with the original data.
device
property
Returns the device on which the tensor is stored.
Returns:
-
device(Device) –The device on which the tensor is stored.
dtype
property
Returns the data type of the tensor.
Returns:
-
dtype(dtype) –The data type of the tensor.
ndim
property
Returns the number of dimensions of the tensor.
Returns:
-
int–Number of dimensions of the tensor.
shape
property
Returns the shape of the tensor as a tuple.
Returns:
-
tuple–Shape of the tensor.
__add__(other)
__init__(array, *, device=None, dtype=None, requires_grad=True)
Construct a Tensor by copying array.
Parameters:
-
array(object) –The array to be copied.
-
device(Device, default:None) –The device on which to place the tensor. Default is None.
-
dtype(str, default:None) –The data type of the tensor. Default is None.
-
requires_grad(bool, default:True) –If True, the tensor will track gradients. Default is True.
__matmul__(other)
__mul__(other)
__neg__()
__pow__(other)
__repr__()
String representation of the tensor.
Returns:
-
str–String representation showing the tensor data.
__str__()
String representation of the tensor.
Returns:
-
str–String representation of the tensor data.
__sub__(other)
__truediv__(other)
backward(out_grad=None)
Computes the gradients of the tensor with respect to the output gradient.
Parameters:
-
out_grad(Tensor, default:None) –The gradient of the output with respect to which the gradients are computed. If None, a tensor of ones is used.
Returns:
-
None–This method updates the
gradattribute of the tensor and its dependencies with the computed gradients.
broadcast_to(shape)
Broadcasts the tensor to the specified shape.
Parameters:
-
shape(tuple of ints) –The new shape of the tensor.
Returns:
-
Tensor–A new tensor with the specified shape.
detach()
Returns a new Tensor with no history (detached from the computation graph). The returned Tensor will share the same data with the original one.
from_constant(data, requires_grad=False)
classmethod
Creates a leaf node Tensor from the given data.
from_operation(op, inputs)
classmethod
Creates a node Tensor by applying the op operation on the inputs
Tensors.
is_leaf()
All Tensors that have requires_grad set to False OR they were
created by the user and were not the result of an operation are
considered leaf Tensors.
numpy()
Returns Tensor as Numpy ndarray. The underlying data will be shared
between Tensor and the Numpy ndarray.
realize_cached_data()
Run computation to get the output if the LAZY MODE is on, else return cached data.
reshape(shape)
Reshapes the tensor to the specified shape.
Parameters:
-
shape(tuple of ints) –The new shape of the tensor.
Returns:
-
Tensor–A new tensor with the specified shape.
sum(axes=None)
Returns the sum of elements over specified axes.
Parameters:
-
axes(None or int or tuple of ints, default:None) –Axis or axes along which a sum is performed. The default is to sum all of the elements of the input tensor.
Returns:
-
Tensor–A new tensor with the sum of elements over specified axes.
transpose(axes=None)
Transposes the tensor according to the specified axes.
Parameters:
-
axes(tuple of ints, default:None) –By default, reverse the dimensions, otherwise permute the axes according to the values given.
Returns:
-
Tensor–A new tensor with the specified axes transposed.
TensorOp
TensorTuple
Bases: Tensor
Represent a tuple of tensors.
To keep things simple, we do not support nested tuples.
detach()
Create a new tensor that shares the data but detaches from the graph.
TensorTupleOp
compute_gradients(out_tensor, out_grad)
Compute gradients for all nodes in the computation graph.
This function implements reverse-mode automatic differentiation by traversing the computation graph in reverse topological order and computing gradients for each node.
Parameters:
-
out_tensor(Tensor) –The output tensor for which gradients are computed.
-
out_grad(Tensor) –The gradient of the output with respect to the final result.
Notes
This function modifies the grad attribute of tensors in the computation
graph. It stores the computed result in the grad field of each tensor.
find_topo_sort(node_list)
Find topological sort of nodes in the computation graph.
Given a list of nodes, return a topological sort list of nodes ending in them. A simple algorithm is to do a post-order DFS traversal on the given nodes, going backwards based on input edges. Since a node is added to the ordering after all its predecessors are traversed due to post-order DFS, we get a topological sort.
Parameters:
-
node_list(list[Tensor]) –List of tensors to sort topologically.
Returns:
-
list[Tensor]–Topologically sorted list of tensors.