Skip to content

Models

Natural Language Processing models for tiny-pytorch implementation.

This module provides pre-built neural network architectures for natural language processing tasks, specifically designed for sequence modeling and language understanding. The models are built using the core neural network components from the tiny-pytorch framework.

The module includes implementations of popular language model architectures adapted for the tiny-pytorch ecosystem, focusing on efficiency and educational value while maintaining compatibility with the framework's tensor operations and automatic differentiation system.

Key Features
  • Pre-built language models for sequence prediction
  • Support for both RNN and LSTM sequence models
  • Configurable embedding and hidden layer dimensions
  • Multi-layer sequence model architectures
  • Efficient implementations optimized for the tiny-pytorch framework
  • Educational models that demonstrate modern NLP design patterns

Classes:

  • LanguageModel

    A complete language model architecture for sequence prediction tasks. Features an embedding layer, configurable sequence model (RNN/LSTM), and output projection layer. Designed for next-word prediction, text generation, and other sequence modeling applications.

Notes

All models in this module are designed to work with the tiny-pytorch tensor system and support automatic differentiation. Input sequences should be provided as token indices, and models output logits for next-token prediction.

The sequence models support both single and multi-layer architectures, with configurable hidden dimensions. The embedding layer maps from vocabulary size to a learned embedding space, while the output layer projects back to vocabulary size for next-token prediction.

Examples:

>>> from tiny_pytorch.nlp.models import LanguageModel
>>>
>>> # Create a language model for text generation
>>> model = LanguageModel(
...     embedding_size=128,
...     output_size=1000,  # vocabulary size
...     hidden_size=256,
...     num_layers=2,
...     seq_model='lstm'
... )
>>>
>>> # Prepare input data (seq_len=10, batch_size=32)
>>> x = Tensor.randint(0, 1000, (10, 32))
>>>
>>> # Forward pass to get next-token predictions
>>> logits, hidden = model(x)
>>> print(logits.shape)  # (320, 1000) - (seq_len*batch_size, vocab_size)
>>>
>>> # The model is ready for training with appropriate loss functions
>>> # and optimizers from the tiny-pytorch framework

LanguageModel

Bases: Module

A language model for sequence prediction tasks.

This module implements a complete language model architecture consisting of an embedding layer, a sequence model (RNN or LSTM), and a linear output layer. It is designed for tasks like next-word prediction, text generation, and other sequence modeling applications.

The model architecture follows this pattern: 1. Embedding layer: Converts input token indices to dense vectors 2. Sequence model: Processes the embedded sequence (RNN or LSTM) 3. Linear layer: Projects the final hidden states to vocabulary logits

Parameters:

  • num_embeddings (int) –

    The size of the vocabulary (number of unique tokens).

  • embedding_dim (int) –

    The dimensionality of the embedding vectors.

  • hidden_size (int) –

    The number of features in the hidden state of the sequence model.

  • num_layers (int, default: 1 ) –

    Number of layers in the RNN or LSTM. Default is 1.

  • seq_model (str, default: 'rnn' ) –

    Type of sequence model to use. Must be either 'rnn' or 'lstm'. Default is 'rnn'.

  • device (Device, default: None ) –

    Device on which to place the model parameters. Default is None (uses default device).

  • dtype (str, default: 'float32' ) –

    Data type of the model parameters. Default is "float32".

Attributes:

  • output_size (int) –

    The size of the vocabulary.

  • hidden_size (int) –

    The number of features in the hidden state.

  • embed (Embedding) –

    The embedding layer that converts token indices to vectors.

  • seq_model (RNN or LSTM) –

    The sequence model (RNN or LSTM) for processing the embedded sequence.

  • linear (Linear) –

    The output linear layer that projects to vocabulary logits.

Notes
  • Input sequences should be provided as token indices in shape (seq_len, batch_size).
  • The model outputs logits for next-token prediction at each position.
  • Supports both RNN and LSTM sequence models with configurable layers.
  • The embedding layer maps from vocabulary size to embedding dimension.
  • The linear layer projects from hidden size back to vocabulary size.

Examples:

>>> model = LanguageModel(
...     embedding_size=128,
...     output_size=1000,  # vocabulary size
...     hidden_size=256,
...     num_layers=2,
...     seq_model='lstm'
... )
>>>
>>> # Input: (seq_len=10, batch_size=32)
>>> x = Tensor.randint(0, 1000, (10, 32))
>>>
>>> # Forward pass
>>> logits, hidden = model(x)
>>> print(logits.shape)  # (320, 1000) - (seq_len*batch_size, vocab_size)
>>> print(hidden[0].shape if isinstance(hidden, tuple) else hidden.shape)
>>> # (2, 32, 256) - (num_layers, batch_size, hidden_size)

__init__(num_embeddings, embedding_dim, hidden_size, num_layers=1, seq_model='rnn', device=None, dtype='float32')

Initialize the LanguageModel.

Parameters:

  • num_embeddings (int) –

    The size of the vocabulary (number of unique tokens).

  • embedding_dim (int) –

    The dimensionality of the embedding vectors.

  • hidden_size (int) –

    The number of features in the hidden state of the sequence model.

  • num_layers (int, default: 1 ) –

    Number of layers in the RNN or LSTM. Default is 1.

  • seq_model (str, default: 'rnn' ) –

    Type of sequence model to use. Must be either 'rnn' or 'lstm'. Default is 'rnn'.

  • device (Device, default: None ) –

    Device on which to place the model parameters. Default is None (uses default device).

  • dtype (str, default: 'float32' ) –

    Data type of the model parameters. Default is "float32".

Raises:

  • AssertionError

    If seq_model is not 'rnn' or 'lstm'.

forward(x, h=None)

Forward pass of the language model.

Given an input sequence of token indices, returns logits for next-token prediction along with the final hidden state from the sequence model.

Parameters:

  • x (Tensor) –

    Input tensor of shape (seq_len, batch_size) containing token indices. Each element should be an integer index in the range [0, output_size).

  • h (Tensor or tuple of (Tensor, Tensor) or None, default: None ) –

    Initial hidden state for the sequence model. - For RNN: Tensor of shape (num_layers, batch_size, hidden_size) - For LSTM: Tuple of (h0, c0), each of shape (num_layers, batch_size, hidden_size) - If None, defaults to zeros for RNN or (zeros, zeros) for LSTM.

Returns:

  • logits ( Tensor ) –

    Output tensor of shape (seq_len * batch_size, output_size) containing logits for next-token prediction at each position in the sequence.

  • hidden ( Tensor or tuple of (Tensor, Tensor) ) –

    Final hidden state from the sequence model. - For RNN: Tensor of shape (num_layers, batch_size, hidden_size) - For LSTM: Tuple of (h_n, c_n), each of shape (num_layers, batch_size, hidden_size)

Notes

The output logits are flattened across the sequence dimension, so each position in the sequence contributes batch_size predictions. This is useful for training with cross-entropy loss where each position is treated as a separate prediction task.