Skip to content

Data

Data loading and processing utilities.

This module provides utilities for loading and processing data in the tiny-pytorch framework. It includes dataset abstractions, data loading functionality, and transform operations similar to PyTorch's data utilities.

The module provides base classes for datasets and transforms, as well as concrete implementations for specific data types and transformations.

Classes:

  • Dataset

    Base class that provides common dataset functionality.

  • NDArrayDataset

    Dataset implementation for numpy arrays.

  • DataLoader

    Iterates over a dataset in batches with optional multiprocessing.

  • Transform

    Base class for all data transformations.

  • RandomCrop

    Randomly crops data to specified size.

  • RandomFlipHorizontal

    Randomly flips data horizontally with given probability.

BatchSampler

Wraps a sampler to yield batches of indices.

A BatchSampler takes a sampler that yields individual indices and wraps it to yield batches of indices instead. This is useful for mini-batch training where we want to process multiple samples at once.

Notes

The batch size determines how many indices are yielded in each batch. If drop_last is True, the last batch will be dropped if it's smaller than the batch size.

The sampler can be any iterable that yields indices, but is typically an instance of Sampler.

See Also

Sampler : Base class for sampling individual indices

__init__(sampler, batch_size, drop_last=False)

Parameters:

  • sampler (Sampler or Iterable[int]) –

    Sampler instance or iterable that yields indices.

  • batch_size (int) –

    Number of indices to include in each batch.

  • drop_last (bool, default: False ) –

    If True, drop the last batch if it's smaller than batch_size. Default is False.

DataLoader

Iterator over a dataset that supports batching and parallel data loading.

DataLoader combines a dataset and a sampler, and provides an iterable over the given dataset. It supports automatic batching, parallel data loading, and customizable data loading order.

Notes

The DataLoader provides an efficient way to load data in batches for training and evaluation. It handles the complexities of:

  • Batching individual data points into batches
  • Shuffling the data if requested
  • Parallel data loading using multiple worker processes
  • Custom collation of data samples into batches

__init__(dataset, batch_size=1, n_workers=1, shuffle=False, drop_last=False, collate_fn=collate)

Parameters:

  • dataset (Dataset) –

    Dataset from which to load the data.

  • batch_size (int, default: 1 ) –

    How many samples per batch to load. Default: 1.

  • n_workers (int, default: 1 ) –

    How many subprocesses to use for data loading. Default: 1.

  • shuffle (bool, default: False ) –

    Whether to shuffle the data at every epoch. Default: False.

  • drop_last (bool, default: False ) –

    Whether to drop the last incomplete batch if dataset size is not divisible by batch_size. Default: False.

  • collate_fn (callable, default: collate ) –

    Merges a list of samples to form a mini-batch. Default: collate.

Dataset

Base class for all datasets.

This class defines the basic interface and functionality that all dataset implementations should follow. It provides common methods like getitem for accessing data samples and apply_transforms for data augmentation.

Notes

All datasets should inherit from this base class and implement the getitem method according to their specific data loading requirements.

__init__(transforms=None)

Parameters:

  • transforms (list or None, default: None ) –

    List of transform functions to be applied to data samples. Each transform should be a callable that takes a sample and returns the transformed sample. Default is None.

Notes

The transforms will be applied sequentially in the order they appear in the list when apply_transforms() is called on a sample.

NDArrayDataset

Bases: Dataset

Dataset for working with NDArrays.

A dataset class that wraps NDArrays for use in machine learning tasks. Supports multiple arrays that will be returned as tuples when indexed. Commonly used for features and labels in supervised learning.

Notes

All arrays must have the same first dimension (length). Arrays will be returned in the same order they were passed to init.

__init__(*arrays)

Parameters:

  • *arrays (array_like, default: () ) –

    One or more arrays to include in the dataset. All arrays must have the same first dimension (length).

Raises:

  • ValueError

    If no arrays are provided or if arrays have different lengths.

Notes

Arrays will be returned in the same order they were passed when indexing the dataset.

Examples:

>>> import numpy as np
>>> X = np.random.randn(100, 10)  # 100 samples, 10 features
>>> y = np.random.randint(0, 2, 100)  # Binary labels
>>> dataset = NDArrayDataset(X, y)
>>> x, y = dataset[0]  # Get first sample and label

RandomCrop

Bases: Transform

Transform that randomly crops images after zero padding.

This transform first applies zero padding around the image borders, then randomly crops the padded image back to its original size. This creates slight translations of the image content, which helps models become more robust to object position variations.

Notes

The padding size determines the maximum possible shift in any direction. For example, with padding=3, the image content can be shifted by up to 3 pixels in any direction.

The cropped region maintains the original image dimensions, effectively creating a translated version of the original image with zero padding filling in any gaps.

See Also

RandomFlipHorizontal : Transform that randomly flips images horizontally

__call__(img)

Parameters:

  • img (ndarray) –

    H x W x C array representing an image

Returns:

  • ndarray

    H x W x C array of randomly cropped image after padding

__init__(padding=3)

Parameters:

  • padding (int, default: 3 ) –

    Number of pixels to pad around image borders. Default is 3.

RandomFlipHorizontal

Bases: Transform

Transform that randomly flips images (specified as H x W x C NDArray) horizontally.

This transform applies horizontal flipping to images with a specified probability. Horizontal flipping is a common data augmentation technique that helps models become invariant to the horizontal orientation of objects in images.

Notes

The flip is applied with probability p (default 0.5). When applied, the image is flipped along its horizontal axis, meaning the left side becomes the right side and vice versa.

See Also

RandomCrop : Transform that randomly crops images

__call__(img)

Parameters:

  • img (ndarray) –

    H x W x C array representing an image

Returns:

  • ndarray

    H x W x C array of flipped or original image

__init__(p=0.5)

Parameters:

  • p (float, default: 0.5 ) –

    Probability of flipping the image horizontally. Default is 0.5.

Sampler

Base class for sampling elements from a dataset.

A Sampler provides an iterable over indices of a dataset, defining the order in which elements are visited. This base class supports sequential or shuffled sampling.

Notes

Samplers are used by DataLoader to determine the order and grouping of samples during iteration.

The shuffle parameter determines whether indices are returned in sequential or randomized order.

See Also

BatchSampler : Wraps a sampler to yield batches of indices DataLoader : Uses samplers to iterate over dataset elements

__init__(ds, shuffle=False)

Parameters:

  • ds (Dataset) –

    Dataset to sample from.

  • shuffle (bool, default: False ) –

    If True, samples are returned in random order. Default is False.

Transform

Base class for all transforms.

This class defines the interface for transformations that can be applied to data. Each transform should implement the call method to specify how the data should be transformed.

Notes

Transforms are commonly used in computer vision tasks to augment training data and improve model generalization. They can include operations like flipping, rotating, cropping, or normalizing images.

See Also

RandomFlipHorizontal : Transform that randomly flips images horizontally RandomCrop : Transform that randomly crops images