NDArray
NDArray implementation with multiple backend support.
This module provides the core NDArray class that supports multiple backends including NumPy, CPU, and CUDA. The NDArray class is a Python wrapper for handling operations on n-dimensional arrays with strided memory layout, enabling efficient memory usage and fast operations.
The module implements a flexible array system that can work with different computational backends while maintaining a consistent interface. It supports advanced features like broadcasting, strided memory access, and device abstraction for cross-platform compatibility.
Key Features
- Strided memory layout for efficient array operations
- Multiple backend support (NumPy, CPU, CUDA)
- Broadcasting and reshaping without memory copying
- Element-wise and scalar operations
- Matrix operations and reductions
- Device abstraction for cross-platform compatibility
- Memory-efficient views and slicing operations
- Automatic memory management and optimization
Classes:
-
BackendDevice–Device abstraction that wraps backend implementation modules. Provides a unified interface for different computational backends.
-
NDArray–Multi-dimensional array with strided memory layout. Supports efficient operations on n-dimensional data with automatic memory optimization and device management.
Functions:
-
cpu_numpy–Create a NumPy-based CPU device.
-
cpu–Create a native CPU device.
-
cuda–Create a CUDA device for GPU computation.
-
default_device–Get the default computational device.
-
array–Create an NDArray from array-like input.
-
empty–Create an uninitialized NDArray with given shape.
-
full–Create an NDArray filled with a constant value.
-
broadcast_to–Broadcast an array to a new shape without copying memory.
-
reshape–Reshape an array without copying memory.
-
maximum–Element-wise maximum of two arrays.
-
log–Natural logarithm of array elements.
-
exp–Exponential of array elements.
-
tanh–Hyperbolic tangent of array elements.
-
summation–Sum of array elements over specified axes.
-
negative–Element-wise negation of array.
-
flip–Reverse the order of elements along specified axes.
Notes
The NDArray system is designed for efficiency and flexibility. It uses strided memory layout to enable operations like transposing and reshaping without copying data. The backend device system allows seamless switching between different computational platforms while maintaining consistent behavior.
All operations are optimized for the current backend and automatically handle memory management, device transfers, and computational optimization.
Examples:
>>> import tiny_pytorch as tp
>>> # Create arrays on different devices
>>> x = tp.array([1, 2, 3], device=tp.cpu())
>>> y = tp.array([4, 5, 6], device=tp.cuda())
>>>
>>> # Perform operations
>>> z = x + y # Element-wise addition
>>> w = x @ y # Matrix multiplication
>>>
>>> # Shape manipulation
>>> reshaped = x.reshape((3, 1))
>>> broadcasted = x.broadcast_to((2, 3))
BackendDevice
Backend device that wraps the implementation module for each device.
This class provides a unified interface for different backend implementations (numpy, CPU, CUDA) by forwarding operations to the appropriate module.
Attributes:
-
name(str) –Name of the device (e.g., "cpu", "cuda", "cpu_numpy").
-
module(object) –The backend implementation module that handles actual operations.
__eq__(other)
Check if two devices are equal.
Two devices are equal if they have the same name.
Parameters:
-
other(object) –Device to compare with.
Returns:
-
bool–True if devices have the same name.
__getattr__(name)
Forward attribute access to the implementation module.
All attempts to get attribute from device will be forwarded to the module that implements the device's operations. i.e. device.op will become self.module.op
Parameters:
-
name(str) –Name of the attribute to access.
Returns:
-
object–Attribute from the implementation module.
__init__(name, module=None)
Initialize a new BackendDevice.
Parameters:
-
name(str) –Name of the device.
-
module(object, default:None) –Module that implements the device's operations.
__repr__()
String representation of the device.
Returns:
-
str–String representation showing the device name.
empty(shape, dtype='float32')
Create an empty array.
Parameters:
-
shape(tuple[int, ...]) –Shape of the array.
-
dtype(str, default:'float32') –Data type of the array. Default is "float32".
Returns:
-
NDArray–Empty array with the specified shape.
enabled()
Check if the device is enabled.
Returns:
-
bool–True if the device has an implementation module.
full(shape, fill_value, dtype='float32')
Create an array filled with a constant value.
Parameters:
-
shape(tuple[int, ...]) –Shape of the array.
-
fill_value(float) –Value to fill the array with.
-
dtype(str, default:'float32') –Data type of the array. Default is "float32".
Returns:
-
NDArray–Array filled with the specified value.
one_hot(n, i, dtype='float32')
rand(*shape, dtype='float32')
Generate random numbers from uniform distribution.
Parameters:
-
*shape(int, default:()) –Shape of the output array.
-
dtype(str, default:'float32') –Data type of the array. Default is "float32".
Returns:
-
NDArray–Array with random values from U[0, 1).
randn(*shape, dtype='float32')
Generate random numbers from standard normal distribution.
Parameters:
-
*shape(int, default:()) –Shape of the output array.
-
dtype(str, default:'float32') –Data type of the array. Default is "float32".
Returns:
-
NDArray–Array with random values from N(0, 1).
NDArray
A generic ND array class that may contain multiple different backends.
NDArray is basically a Python wrapper for handling operations on n-dimensional arrays. The underlying array is just a flat 1D array and the backend device will handle all the ops on the 1D array. Strides, shape, and offset allows us to map n-dimensional (logical) array to the 1D flat array that are physically allocated in memory.
The high level ops such as broadcasting and transposing are all done in
Python without touching the underlying array. The other raw ops such as
addition and matrix-multiplication will be implemented in C/C++ that would
call highly optimized Kernels for such ops such as CUDA Kernels.
To make the backend code simpler, we will only operate on compact arrays so
we need to call compact() before any ops AND we support only float32 data
type.
Attributes:
-
device(BackendDevice) –Device that handles the operations.
-
shape(tuple[int, ...]) –Shape of the array.
-
strides(tuple[int, ...]) –Strides for accessing elements in the underlying 1D array.
-
size(int) –Total number of elements in the array.
-
ndim(int) –Number of dimensions in the array.
-
dtype(str) –Data type of the array (currently only "float32" is supported).
flat
property
Return a 1-D view (flattened) of the array.
Returns:
-
NDArray–A 1-dimensional view of the array with the same data.
Examples:
>>> a = NDArray([[1, 2], [3, 4]])
>>> a.flat
NDArray([1, 2, 3, 4], device=cpu_numpy())
__getitem__(idxs)
Parameters:
-
idxs–Indices to the subset of the n-dimensional array.
Returns:
-
NDArray–NDArray corresponding to the selected subset of elements.
Raises:
-
AssertionError–If a slice has negative step, or if number of slices is not equal to the number of dimensions.
-
TypeError–If an index is not an int or slice.
__init__(other, device=None)
Create NDArray by copying another NDArray/Numpy array OR use Numpy as a bridge for all other types of iterables.
Parameters:
-
other(NDArray or ndarray or array_like) –Source data to create the NDArray from.
-
device(BackendDevice, default:None) –Device to place the array on. If None, uses default device.
__matmul__(other)
Matrix multiplication of two arrays. This requires that both arrays be 2D (i.e., we don't handle batch matrix multiplication), and that the sizes match up properly for matrix multiplication.
__setitem__(idxs, other)
Set the values of a view into an array, using the same semantics as getitem().
as_strided(shape, strides)
Create a strided view of the underlying memory without copying anything.
broadcast_to(new_shape)
Broadcast an array to a new shape. new_shape's elements must be the
same as the original shape, except for dimensions in the self where
the size = 1 (which can then be broadcast to any size). This will not
copy memory, and just achieves broadcasting by manipulating the strides.
Parameters:
-
new_shape–Shape to broadcast to.
Returns:
-
NDArray–New NDArray object with the new broadcast shape; should point to the same memory as the original array.
Raises:
-
AssertionError–If new_shape[i] != shape[i] for all i where shape[i] != 1
compact()
Convert NDArray to be compact if it is not already compact.
compact_strides(shape)
staticmethod
Utility function to compute compact strides.
N-dimensional arrays are represented (with row-major order) contiguously from the inner most dimension to the outer most dimension. Examples: 1. 4 x 3 array will be represented physically in memory with first row (3 elements) then the second and so on -> strides = (3, 1) 2. 4 x 3 x 2 array will be represented with inner most dimension first until its done (2 in this case), then next outer dimension (3 rows of 2), finally outer most dimension which has 4 (3 x 2) arrays -> strides = (6, 2, 1)
ewise_or_scalar(other, ewise_func, scalar_func)
Run either an element-wise or scalar version of a function, depending on whether "other" is an NDArray or scalar.
fill(value)
Fill in-place with a constant value.
flip(axes=None)
Reverse (flip) the order of elements in the array along the specified axes.
Parameters:
-
axes(tuple[int, ...] or None, default:None) –Axes along which to flip the array. Each axis index must be valid for the array's dimensions. If None, flip over all axes (reverse the array in every dimension).
Returns:
-
NDArray–A view of the array with the entries reversed along the specified axes.
Notes
This operation does not copy the data; it returns a view with modified strides and offset. The result is compacted before being returned.
Raises:
-
AxisError–If the number of axes is greater than the number of dimensions, or if any axis is out of bounds.
Examples:
>>> a = NDArray([[1, 2], [3, 4]])
>>> a.flip((0,))
NDArray([[3, 4], [1, 2]], device=cpu_numpy())
>>> a.flip((1,))
NDArray([[2, 1], [4, 3]], device=cpu_numpy())
>>> a.flip((0, 1))
NDArray([[4, 3], [2, 1]], device=cpu_numpy())
>>> a.flip() # or a.flip(None)
NDArray([[4, 3], [2, 1]], device=cpu_numpy())
is_compact()
Return true if array is compact in memory and internal size equals product of the shape dimensions.
make(shape, strides=None, device=None, handle=None, offset=0)
staticmethod
Create a new NDArray with the given properties. Memory will only be
allocated if handle is None, otherwise it will use the same
underlying memory.
max(axis=None, keepdims=False)
Max either across all axis (when axis=None) or one axis.
Note: It doesn't support axis being multiple of axes.
numpy()
Convert the NDArray to a NumPy array.
Returns:
-
ndarray–NumPy array with the same data.
pad(axes)
Pad the array with zeros along each axis according to the specified padding widths.
Parameters:
-
axes(tuple[tuple[int, int], ...]) –A tuple specifying the number of values padded to the edges of each axis. Each element should be a tuple of two integers (pad_before, pad_after), where pad_before is the number of values padded before the first element and pad_after is the number of values padded after the last element for that axis. The length of axes must match the number of dimensions of the array.
Returns:
-
NDArray–A new NDArray with the specified padding applied, filled with zeros in the padded regions.
Raises:
-
AssertionError–If the length of axes does not match the number of dimensions of the array.
Examples:
>>> a = NDArray([[1, 2], [3, 4]])
>>> a.pad(((1, 1), (2, 2)))
NDArray(
[[0, 0, 0, 0, 0, 0],
[0, 0, 1, 2, 0, 0],
[0, 0, 3, 4, 0, 0],
[0, 0, 0, 0, 0, 0]], device=cpu_numpy())
permute(new_axes)
Permute order of the dimensions. new_axes describes a permutation of
the existing axes, Example:
- If we have an array with dimension "BHWC" then
.permute((0,3,1,2))would convert this to "BCHW" order. - For a 2D array,
.permute((1,0))would transpose the array.
Like reshape, this operation should not copy memory, but achieves the
permuting by just adjusting the shape/strides of the array. That is,
it returns a new array that has the dimensions permuted as desired, but
which points to the same memory as the original array.
Parameters:
-
new_axes–Permutation order of the dimensions.
Returns:
-
NDarray–New NDArray object with permuted dimensions, pointing to the same memory as the original NDArray (i.e., just shape and strides changed).
reduce_view_out(axis, keepdims=False)
Return a view to the array set up for reduction functions and output array.
reshape(new_shape)
Reshape the array without copying memory. This will return a new array (view) that corresponds to a reshaped array but points to the same memory as the original array. Therefore, we only change the shape and the strides to get the new n-dimensional logical view of the array.
Parameters:
-
new_shape–New shape of the array.
Returns:
-
NDArray–Reshaped array; this will point to the same memory as the original NDArray.
Raises:
-
ValueError–If product of current shape is not equal to the product of the new shape, or if the matrix is not compact.
sum(axis=None, keepdims=False)
Sum either across all axis (when axis=None) or one axis.
Note: It doesn't support axis being multiple of axes.
to(device)
Move the array to a different device.
Parameters:
-
device(BackendDevice) –Target device.
Returns:
-
NDArray–Array on the target device.
all_devices()
array(a, dtype='float32', device=None)
Create an NDArray from array-like data.
Parameters:
-
a(array_like) –Input data to create the array from.
-
dtype(str, default:'float32') –Data type of the array. Default is "float32".
-
device(BackendDevice, default:None) –Device to place the array on. If None, uses default device.
Returns:
-
NDArray–New NDArray with the specified data.
broadcast_to(array, new_shape)
cpu()
Create a CPU device with native backend if available.
Attempts to use the native CPU backend, falls back to NumPy if the C++ extension is not available.
Returns:
-
BackendDevice–CPU device with best available backend.
cpu_numpy()
cuda()
Create a CUDA device if available.
Returns:
-
BackendDevice–CUDA device, or disabled device if CUDA is not available.
default_device()
empty(shape, dtype='float32', device=None)
Create an empty NDArray.
Parameters:
-
shape(tuple[int, ...]) –Shape of the array.
-
dtype(str, default:'float32') –Data type of the array. Default is "float32".
-
device(BackendDevice, default:None) –Device to place the array on. If None, uses default device.
Returns:
-
NDArray–Empty NDArray with the specified shape.
exp(a)
flip(a, axes)
Reverse (flip) the order of elements in an array along the specified axes.
Parameters:
-
a(NDArray) –Input array to be flipped.
-
axes(tuple[int, ...] or None) –Axes along which to flip the array. Each axis index must be valid for the array's dimensions. If None, flip over all axes (reverse the array in every dimension).
Returns:
-
NDArray–A view of the array with the entries reversed along the specified axes.
Notes
This operation does not copy the data; it returns a view with modified strides and offset. The result is compacted before being returned.
Raises:
-
AxisError–If the number of axes is greater than the number of dimensions, or if any axis is out of bounds.
Examples:
>>> a = NDArray([[1, 2], [3, 4]])
>>> flip(a, (0,))
NDArray([[3, 4], [1, 2]], device=cpu_numpy())
>>> flip(a, (1,))
NDArray([[2, 1], [4, 3]], device=cpu_numpy())
>>> flip(a, (0, 1))
NDArray([[4, 3], [2, 1]], device=cpu_numpy())
>>> flip(a, None)
NDArray([[4, 3], [2, 1]], device=cpu_numpy())
full(shape, fill_value, dtype='float32', device=None)
Create an NDArray filled with a constant value.
Parameters:
-
shape(tuple[int, ...]) –Shape of the array.
-
fill_value(float) –Value to fill the array with.
-
dtype(str, default:'float32') –Data type of the array. Default is "float32".
-
device(BackendDevice, default:None) –Device to place the array on. If None, uses default device.
Returns:
-
NDArray–NDArray filled with the specified value.
log(a)
maximum(a, b)
negative(a)
Numerical negative, element-wise.
Parameters:
-
a(NDArray) –Input array.
Returns:
-
NDArray–Returned array or scalar: y = -x.
Examples:
>>> a = NDArray([1, -1, 2.5])
>>> negative(a)
NDArray([-1.0, 1.0, -2.5], device=cpu_numpy())
>>> b = NDArray([[1, 2], [3, 4]])
>>> negative(b)
NDArray([[-1.0, -2.0], [-3.0, -4.0]], device=cpu_numpy())
reshape(array, new_shape)
Reshape an array to a new shape.
Parameters:
-
array(NDArray) –Array to reshape.
-
new_shape(tuple[int, ...]) –New shape of the array.
Returns:
-
NDArray–Reshaped array.
Raises:
-
ValueError–If the product of the new shape is not equal to the product of the original shape, or if the array is not compact.
summation(a, axis=None, keepdims=False)
Sum of array elements over a given axis.
Parameters:
-
a(NDArray) –Input array.
-
axis(int or None, default:None) –Axis along which a sum is performed. The default, axis=None, will sum all of the elements of the input array. If axis is negative it counts from the last to the first axis.
Note: Only supports reduction over a single axis or all axes. Multiple axes reduction is not supported.
-
keepdims(bool, default:False) –If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.
Returns:
-
NDArray–An array with the same shape as a, with the specified axis removed. If a is a 0-d array, or if axis is None, a scalar is returned. If an output array is specified, a reference to out is returned.
Raises:
-
ValueError–If an empty axis tuple is provided.