NDArray

NDArray implementation with multiple backend support.

This module provides the core NDArray class that supports multiple backends including NumPy, CPU, and CUDA. The NDArray class is a Python wrapper for handling operations on n-dimensional arrays with strided memory layout.

The module includes: - BackendDevice class for device abstraction - NDArray class with strided array operations - Device factory functions (cpu, cuda, etc.) - Array creation utilities

Key Features

Strided memory layout for efficient array operations
Multiple backend support (NumPy, CPU, CUDA)
Broadcasting and reshaping without memory copying
Element-wise and scalar operations
Matrix operations and reductions

Classes:

BackendDevice –

Device abstraction that wraps backend implementation modules.
NDArray –

Multi-dimensional array with strided memory layout.

Functions:

cpu_numpy, cpu, cuda –

Device factory functions.
array, empty, full –

Array creation utilities.
broadcast_to –

Broadcasting utility function.

`BackendDevice`

Backend device that wraps the implementation module for each device.

This class provides a unified interface for different backend implementations (numpy, CPU, CUDA) by forwarding operations to the appropriate module.

Attributes:

name (str) –

Name of the device (e.g., "cpu", "cuda", "cpu_numpy").
module (object) –

The backend implementation module that handles actual operations.

`eq(other)`

Check if two devices are equal.

Two devices are equal if they have the same name.

Parameters:

other (object) –

Device to compare with.

Returns:

bool –

True if devices have the same name.

`getattr(name)`

Forward attribute access to the implementation module.

All attempts to get attribute from device will be forwarded to the module that implements the device's operations. i.e. device.op will become self.module.op

Parameters:

name (str) –

Name of the attribute to access.

Returns:

object –

Attribute from the implementation module.

`init(name, module=None)`

Initialize a new BackendDevice.

Parameters:

name (str) –

Name of the device.
module (object, default: None ) –

Module that implements the device's operations.

`repr()`

String representation of the device.

Returns:

str –

String representation showing the device name.

`empty(shape, dtype='float32')`

Create an empty array.

Parameters:

shape (tuple[int, ...]) –

Shape of the array.
dtype (str, default: 'float32' ) –

Data type of the array. Default is "float32".

Returns:

NDArray –

Empty array with the specified shape.

`enabled()`

Check if the device is enabled.

Returns:

bool –

True if the device has an implementation module.

`full(shape, fill_value, dtype='float32')`

Create an array filled with a constant value.

Parameters:

shape (tuple[int, ...]) –

Shape of the array.
fill_value (float) –

Value to fill the array with.
dtype (str, default: 'float32' ) –

Data type of the array. Default is "float32".

Returns:

NDArray –

Array filled with the specified value.

`one_hot(n, i, dtype='float32')`

Create a one-hot encoded array.

Parameters:

n (int) –

Number of classes.
i (int or array - like) –

Indices to encode.
dtype (str, default: 'float32' ) –

Data type of the array. Default is "float32".

Returns:

NDArray –

One-hot encoded array.

`rand(*shape, dtype='float32')`

Generate random numbers from uniform distribution.

Parameters:

*shape (int, default: () ) –

Shape of the output array.
dtype (str, default: 'float32' ) –

Data type of the array. Default is "float32".

Returns:

NDArray –

Array with random values from U[0, 1).

`randn(*shape, dtype='float32')`

Generate random numbers from standard normal distribution.

Parameters:

*shape (int, default: () ) –

Shape of the output array.
dtype (str, default: 'float32' ) –

Data type of the array. Default is "float32".

Returns:

NDArray –

Array with random values from N(0, 1).

`NDArray`

A generic ND array class that may contain multiple different backends.

NDArray is basically a Python wrapper for handling operations on n-dimensional arrays. The underlying array is just a flat 1D array and the backend device will handle all the ops on the 1D array. Strides, shape, and offset allows us to map n-dimensional (logical) array to the 1D flat array that are physically allocated in memory.

The high level ops such as broadcasting and transposing are all done in Python without touching the underlying array. The other raw ops such as addition and matrix-multiplication will be implemented in C/C++ that would call highly optimized Kernels for such ops such as CUDA Kernels.

To make the backend code simpler, we will only operate on compact arrays so we need to call compact() before any ops AND we support only float32 data type.

Attributes:

_shape (tuple[int, ...]) –

Shape of the array.
_strides (tuple[int, ...]) –

Strides for accessing elements in the underlying 1D array.
_offset (int) –

Offset into the underlying 1D array.
_device (BackendDevice) –

Device that handles the operations.
_handle (Array) –

Pointer to the underlying 1D array.

`getitem(idxs)`

Parameters:

idxs –

Indices to the subset of the n-dimensional array.

Returns:

NDArray –

NDArray corresponding to the selected subset of elements.

Raises:

AssertionError –

If a slice has negative step, or if number of slices is not equal to the number of dimensions.

`init(other, device=None)`

Create NDArray by copying another NDArray/Numpy array OR use Numpy as a bridge for all other types of iterables.

Parameters:

other (NDArray or ndarray or array_like) –

Source data to create the NDArray from.
device (BackendDevice, default: None ) –

Device to place the array on. If None, uses default device.

`matmul(other)`

Matrix multiplication of two arrays. This requires that both arrays be 2D (i.e., we don't handle batch matrix multiplication), and that the sizes match up properly for matrix multiplication.

`setitem(idxs, other)`

Set the values of a view into an array, using the same semantics as getitem().

`as_strided(shape, strides)`

Create a strided view of the underlying memory without copying anything.

`broadcast_to(new_shape)`

Broadcast an array to a new shape. new_shape's elements must be the same as the original shape, except for dimensions in the self where the size = 1 (which can then be broadcast to any size). This will not copy memory, and just achieves broadcasting by manipulating the strides.

Parameters:

new_shape –

Shape to broadcast to.

Returns:

NDArray –

New NDArray object with the new broadcast shape; should point to the same memory as the original array.

Raises:

AssertionError –

If new_shape[i] != shape[i] for all i where shape[i] != 1

`compact()`

Convert NDArray to be compact if it is not already compact.

`compact_strides(shape)` `staticmethod`

Utility function to compute compact strides.

N-dimensional arrays are represented (with row-major order) contiguously from the inner most dimension to the outer most dimension. Examples: 1. 4 x 3 array will be represented physically in memory with first row (3 elements) then the second and so on -> strides = (3, 1) 2. 4 x 3 x 2 array will be represented with inner most dimension first until its done (2 in this case), then next outer dimension (3 rows of 2), finally outer most dimension which has 4 (3 x 2) arrays -> strides = (6, 2, 1)

`ewise_or_scalar(other, ewise_func, scalar_func)`

Run either an element-wise or scalar version of a function, depending on whether "other" is an NDArray or scalar.

`fill(value)`

Fill in-place with a constant value.

`is_compact()`

Return true if array is compact in memory and internal size equals product of the shape dimensions.

`make(shape, strides=None, device=None, handle=None, offset=0)` `staticmethod`

Create a new NDArray with the given properties. Memory will only be allocated if handle is None, otherwise it will use the same underlying memory.

`max(axis=None, keepdims=False)`

Max either across all axis (when axis=None) or one axis.

Note: It doesn't support axis being multiple of axes.

`numpy()`

Convert the NDArray to a NumPy array.

Returns:

ndarray –

NumPy array with the same data.

`permute(new_axes)`

Permute order of the dimensions. new_axes describes a permutation of the existing axes, Example:

If we have an array with dimension "BHWC" then .permute((0,3,1,2)) would convert this to "BCHW" order.
For a 2D array, .permute((1,0)) would transpose the array.

Like reshape, this operation should not copy memory, but achieves the permuting by just adjusting the shape/strides of the array. That is, it returns a new array that has the dimensions permuted as desired, but which points to the same memory as the original array.

Parameters:

new_axes –

Permutation order of the dimensions.

Returns:

NDarray –

New NDArray object with permuted dimensions, pointing to the same memory as the original NDArray (i.e., just shape and strides changed).

`reduce_view_out(axis, keepdims=False)`

Return a view to the array set up for reduction functions and output array.

`reshape(new_shape)`

Reshape the array without copying memory. This will return a new array (view) that corresponds to a reshaped array but points to the same memory as the original array. Therefore, we only change the shape and the strides to get the new n-dimensional logical view of the array.

Parameters:

new_shape –

New shape of the array.

Returns:

NDArray –

Reshaped array; this will point to the same memory as the original NDArray.

Raises:

ValueError –

If product of current shape is not equal to the product of the new shape, or if the matrix is not compact.

`sum(axis=None, keepdims=False)`

Sum either across all axis (when axis=None) or one axis.

Note: It doesn't support axis being multiple of axes.

`to(device)`

Move the array to a different device.

Parameters:

device (BackendDevice) –

Target device.

Returns:

NDArray –

Array on the target device.

`all_devices()`

Get a list of all available devices.

Returns:

list[BackendDevice] –

List of all available devices.

`array(a, dtype='float32', device=None)`

Create an NDArray from array-like data.

Parameters:

a (array_like) –

Input data to create the array from.
dtype (str, default: 'float32' ) –

Data type of the array. Default is "float32".
device (BackendDevice, default: None ) –

Device to place the array on. If None, uses default device.

Returns:

NDArray –

New NDArray with the specified data.

`broadcast_to(array, new_shape)`

Broadcast an array to a new shape.

Parameters:

array (NDArray) –

Array to broadcast.
new_shape (tuple[int, ...]) –

Target shape for broadcasting.

Returns:

NDArray –

Broadcasted array.

`cpu()`

Create a CPU device with native backend if available.

Attempts to use the native CPU backend, falls back to NumPy if the C++ extension is not available.

Returns:

BackendDevice –

CPU device with best available backend.

`cpu_numpy()`

Create a CPU device using NumPy backend.

Returns:

BackendDevice –

CPU device with NumPy backend.

`cuda()`

Create a CUDA device if available.

Returns:

BackendDevice –

CUDA device, or disabled device if CUDA is not available.

`default_device()`

Return the default device (CPU with NumPy backend).

Returns:

BackendDevice –

Default CPU device.

`empty(shape, dtype='float32', device=None)`

Create an empty NDArray.

Parameters:

shape (tuple[int, ...]) –

Shape of the array.
dtype (str, default: 'float32' ) –

Data type of the array. Default is "float32".
device (BackendDevice, default: None ) –

Device to place the array on. If None, uses default device.

Returns:

NDArray –

Empty NDArray with the specified shape.

`full(shape, fill_value, dtype='float32', device=None)`

Create an NDArray filled with a constant value.

Parameters:

shape (tuple[int, ...]) –

Shape of the array.
fill_value (float) –

Value to fill the array with.
dtype (str, default: 'float32' ) –

Data type of the array. Default is "float32".
device (BackendDevice, default: None ) –

Device to place the array on. If None, uses default device.

Returns:

NDArray –

NDArray filled with the specified value.

NDArray

BackendDevice

__eq__(other)

__getattr__(name)

__init__(name, module=None)

__repr__()

empty(shape, dtype='float32')

enabled()

full(shape, fill_value, dtype='float32')

one_hot(n, i, dtype='float32')

rand(*shape, dtype='float32')

randn(*shape, dtype='float32')

NDArray

__getitem__(idxs)

__init__(other, device=None)

__matmul__(other)

__setitem__(idxs, other)

as_strided(shape, strides)

broadcast_to(new_shape)

compact()

compact_strides(shape) staticmethod

ewise_or_scalar(other, ewise_func, scalar_func)

fill(value)

is_compact()

make(shape, strides=None, device=None, handle=None, offset=0) staticmethod

max(axis=None, keepdims=False)

numpy()

permute(new_axes)

reduce_view_out(axis, keepdims=False)

reshape(new_shape)

sum(axis=None, keepdims=False)

to(device)

all_devices()

array(a, dtype='float32', device=None)

broadcast_to(array, new_shape)

cpu()

cpu_numpy()

cuda()

default_device()

empty(shape, dtype='float32', device=None)

full(shape, fill_value, dtype='float32', device=None)

`BackendDevice`

`eq(other)`

`getattr(name)`

`init(name, module=None)`

`repr()`

`empty(shape, dtype='float32')`

`enabled()`

`full(shape, fill_value, dtype='float32')`

`one_hot(n, i, dtype='float32')`

`rand(*shape, dtype='float32')`

`randn(*shape, dtype='float32')`

`NDArray`

`getitem(idxs)`

`init(other, device=None)`

`matmul(other)`

`setitem(idxs, other)`

`as_strided(shape, strides)`

`broadcast_to(new_shape)`

`compact()`

`compact_strides(shape)` `staticmethod`

`ewise_or_scalar(other, ewise_func, scalar_func)`

`fill(value)`

`is_compact()`

`make(shape, strides=None, device=None, handle=None, offset=0)` `staticmethod`

`max(axis=None, keepdims=False)`

`numpy()`

`permute(new_axes)`

`reduce_view_out(axis, keepdims=False)`

`reshape(new_shape)`

`sum(axis=None, keepdims=False)`

`to(device)`

`all_devices()`

`array(a, dtype='float32', device=None)`

`broadcast_to(array, new_shape)`

`cpu()`

`cpu_numpy()`

`cuda()`

`default_device()`

`empty(shape, dtype='float32', device=None)`

`full(shape, fill_value, dtype='float32', device=None)`