Models

Natural Language Processing models for tiny-pytorch implementation.

This module provides pre-built neural network architectures for natural language processing tasks, specifically designed for sequence modeling and language understanding. The models are built using the core neural network components from the tiny-pytorch framework.

The module includes implementations of popular language model architectures adapted for the tiny-pytorch ecosystem, focusing on efficiency and educational value while maintaining compatibility with the framework's tensor operations and automatic differentiation system.

Key Features

Pre-built language models for sequence prediction
Support for both RNN and LSTM sequence models
Configurable embedding and hidden layer dimensions
Multi-layer sequence model architectures
Efficient implementations optimized for the tiny-pytorch framework
Educational models that demonstrate modern NLP design patterns

Classes:

LanguageModel –

A complete language model architecture for sequence prediction tasks. Features an embedding layer, configurable sequence model (RNN/LSTM), and output projection layer. Designed for next-word prediction, text generation, and other sequence modeling applications.

Notes

All models in this module are designed to work with the tiny-pytorch tensor system and support automatic differentiation. Input sequences should be provided as token indices, and models output logits for next-token prediction.

The sequence models support both single and multi-layer architectures, with configurable hidden dimensions. The embedding layer maps from vocabulary size to a learned embedding space, while the output layer projects back to vocabulary size for next-token prediction.

Examples:

>>> from tiny_pytorch.nlp.models import LanguageModel
>>>
>>> # Create a language model for text generation
>>> model = LanguageModel(
...     embedding_size=128,
...     output_size=1000,  # vocabulary size
...     hidden_size=256,
...     num_layers=2,
...     seq_model='lstm'
... )
>>>
>>> # Prepare input data (seq_len=10, batch_size=32)
>>> x = Tensor.randint(0, 1000, (10, 32))
>>>
>>> # Forward pass to get next-token predictions
>>> logits, hidden = model(x)
>>> print(logits.shape)  # (320, 1000) - (seq_len*batch_size, vocab_size)
>>>
>>> # The model is ready for training with appropriate loss functions
>>> # and optimizers from the tiny-pytorch framework

`LanguageModel`

Bases: Module

A language model for sequence prediction tasks.

This module implements a complete language model architecture consisting of an embedding layer, a sequence model (RNN or LSTM), and a linear output layer. It is designed for tasks like next-word prediction, text generation, and other sequence modeling applications.

The model architecture follows this pattern: 1. Embedding layer: Converts input token indices to dense vectors 2. Sequence model: Processes the embedded sequence (RNN or LSTM) 3. Linear layer: Projects the final hidden states to vocabulary logits

Parameters:

num_embeddings (int) –

The size of the vocabulary (number of unique tokens).
embedding_dim (int) –

The dimensionality of the embedding vectors.
hidden_size (int) –

The number of features in the hidden state of the sequence model.
num_layers (int, default: 1 ) –

Number of layers in the RNN or LSTM. Default is 1.
seq_model (str, default: 'rnn' ) –

Type of sequence model to use. Must be either 'rnn' or 'lstm'. Default is 'rnn'.
device (Device, default: None ) –

Device on which to place the model parameters. Default is None (uses default device).
dtype (str, default: 'float32' ) –

Data type of the model parameters. Default is "float32".

Attributes:

output_size (int) –

The size of the vocabulary.
hidden_size (int) –

The number of features in the hidden state.
embed (Embedding) –

The embedding layer that converts token indices to vectors.
seq_model (RNN or LSTM) –

The sequence model (RNN or LSTM) for processing the embedded sequence.
linear (Linear) –

The output linear layer that projects to vocabulary logits.

Notes

Input sequences should be provided as token indices in shape (seq_len, batch_size).
The model outputs logits for next-token prediction at each position.
Supports both RNN and LSTM sequence models with configurable layers.
The embedding layer maps from vocabulary size to embedding dimension.
The linear layer projects from hidden size back to vocabulary size.

Examples:

>>> model = LanguageModel(
...     embedding_size=128,
...     output_size=1000,  # vocabulary size
...     hidden_size=256,
...     num_layers=2,
...     seq_model='lstm'
... )
>>>
>>> # Input: (seq_len=10, batch_size=32)
>>> x = Tensor.randint(0, 1000, (10, 32))
>>>
>>> # Forward pass
>>> logits, hidden = model(x)
>>> print(logits.shape)  # (320, 1000) - (seq_len*batch_size, vocab_size)
>>> print(hidden[0].shape if isinstance(hidden, tuple) else hidden.shape)
>>> # (2, 32, 256) - (num_layers, batch_size, hidden_size)

`init(num_embeddings, embedding_dim, hidden_size, num_layers=1, seq_model='rnn', device=None, dtype='float32')`

Initialize the LanguageModel.

Parameters:

num_embeddings (int) –

The size of the vocabulary (number of unique tokens).
embedding_dim (int) –

The dimensionality of the embedding vectors.
hidden_size (int) –

The number of features in the hidden state of the sequence model.
num_layers (int, default: 1 ) –

Number of layers in the RNN or LSTM. Default is 1.
seq_model (str, default: 'rnn' ) –

Type of sequence model to use. Must be either 'rnn' or 'lstm'. Default is 'rnn'.
device (Device, default: None ) –

Device on which to place the model parameters. Default is None (uses default device).
dtype (str, default: 'float32' ) –

Data type of the model parameters. Default is "float32".

Raises:

AssertionError –

If seq_model is not 'rnn' or 'lstm'.

`forward(x, h=None)`

Forward pass of the language model.

Given an input sequence of token indices, returns logits for next-token prediction along with the final hidden state from the sequence model.

Parameters:

x (Tensor) –

Input tensor of shape (seq_len, batch_size) containing token indices. Each element should be an integer index in the range [0, output_size).
h (Tensor or tuple of (Tensor, Tensor) or None, default: None ) –

Initial hidden state for the sequence model. - For RNN: Tensor of shape (num_layers, batch_size, hidden_size) - For LSTM: Tuple of (h0, c0), each of shape (num_layers, batch_size, hidden_size) - If None, defaults to zeros for RNN or (zeros, zeros) for LSTM.

Returns:

logits ( Tensor ) –

Output tensor of shape (seq_len * batch_size, output_size) containing logits for next-token prediction at each position in the sequence.
hidden ( Tensor or tuple of (Tensor, Tensor) ) –

Final hidden state from the sequence model. - For RNN: Tensor of shape (num_layers, batch_size, hidden_size) - For LSTM: Tuple of (h_n, c_n), each of shape (num_layers, batch_size, hidden_size)

Notes

The output logits are flattened across the sequence dimension, so each position in the sequence contributes batch_size predictions. This is useful for training with cross-entropy loss where each position is treated as a separate prediction task.

Models

LanguageModel

__init__(num_embeddings, embedding_dim, hidden_size, num_layers=1, seq_model='rnn', device=None, dtype='float32')

forward(x, h=None)

`LanguageModel`

`init(num_embeddings, embedding_dim, hidden_size, num_layers=1, seq_model='rnn', device=None, dtype='float32')`

`forward(x, h=None)`