Models
Natural Language Processing models for tiny-pytorch implementation.
This module provides pre-built neural network architectures for natural language processing tasks, specifically designed for sequence modeling and language understanding. The models are built using the core neural network components from the tiny-pytorch framework.
The module includes implementations of popular language model architectures adapted for the tiny-pytorch ecosystem, focusing on efficiency and educational value while maintaining compatibility with the framework's tensor operations and automatic differentiation system.
Key Features
- Pre-built language models for sequence prediction
- Support for both RNN and LSTM sequence models
- Configurable embedding and hidden layer dimensions
- Multi-layer sequence model architectures
- Efficient implementations optimized for the tiny-pytorch framework
- Educational models that demonstrate modern NLP design patterns
Classes:
-
LanguageModel–A complete language model architecture for sequence prediction tasks. Features an embedding layer, configurable sequence model (RNN/LSTM), and output projection layer. Designed for next-word prediction, text generation, and other sequence modeling applications.
Notes
All models in this module are designed to work with the tiny-pytorch tensor system and support automatic differentiation. Input sequences should be provided as token indices, and models output logits for next-token prediction.
The sequence models support both single and multi-layer architectures, with configurable hidden dimensions. The embedding layer maps from vocabulary size to a learned embedding space, while the output layer projects back to vocabulary size for next-token prediction.
Examples:
>>> from tiny_pytorch.nlp.models import LanguageModel
>>>
>>> # Create a language model for text generation
>>> model = LanguageModel(
... embedding_size=128,
... output_size=1000, # vocabulary size
... hidden_size=256,
... num_layers=2,
... seq_model='lstm'
... )
>>>
>>> # Prepare input data (seq_len=10, batch_size=32)
>>> x = Tensor.randint(0, 1000, (10, 32))
>>>
>>> # Forward pass to get next-token predictions
>>> logits, hidden = model(x)
>>> print(logits.shape) # (320, 1000) - (seq_len*batch_size, vocab_size)
>>>
>>> # The model is ready for training with appropriate loss functions
>>> # and optimizers from the tiny-pytorch framework
LanguageModel
Bases: Module
A language model for sequence prediction tasks.
This module implements a complete language model architecture consisting of an embedding layer, a sequence model (RNN or LSTM), and a linear output layer. It is designed for tasks like next-word prediction, text generation, and other sequence modeling applications.
The model architecture follows this pattern: 1. Embedding layer: Converts input token indices to dense vectors 2. Sequence model: Processes the embedded sequence (RNN or LSTM) 3. Linear layer: Projects the final hidden states to vocabulary logits
Parameters:
-
num_embeddings(int) –The size of the vocabulary (number of unique tokens).
-
embedding_dim(int) –The dimensionality of the embedding vectors.
-
hidden_size(int) –The number of features in the hidden state of the sequence model.
-
num_layers(int, default:1) –Number of layers in the RNN or LSTM. Default is 1.
-
seq_model(str, default:'rnn') –Type of sequence model to use. Must be either 'rnn' or 'lstm'. Default is 'rnn'.
-
device(Device, default:None) –Device on which to place the model parameters. Default is None (uses default device).
-
dtype(str, default:'float32') –Data type of the model parameters. Default is "float32".
Attributes:
-
output_size(int) –The size of the vocabulary.
-
hidden_size(int) –The number of features in the hidden state.
-
embed(Embedding) –The embedding layer that converts token indices to vectors.
-
seq_model(RNN or LSTM) –The sequence model (RNN or LSTM) for processing the embedded sequence.
-
linear(Linear) –The output linear layer that projects to vocabulary logits.
Notes
- Input sequences should be provided as token indices in shape (seq_len, batch_size).
- The model outputs logits for next-token prediction at each position.
- Supports both RNN and LSTM sequence models with configurable layers.
- The embedding layer maps from vocabulary size to embedding dimension.
- The linear layer projects from hidden size back to vocabulary size.
Examples:
>>> model = LanguageModel(
... embedding_size=128,
... output_size=1000, # vocabulary size
... hidden_size=256,
... num_layers=2,
... seq_model='lstm'
... )
>>>
>>> # Input: (seq_len=10, batch_size=32)
>>> x = Tensor.randint(0, 1000, (10, 32))
>>>
>>> # Forward pass
>>> logits, hidden = model(x)
>>> print(logits.shape) # (320, 1000) - (seq_len*batch_size, vocab_size)
>>> print(hidden[0].shape if isinstance(hidden, tuple) else hidden.shape)
>>> # (2, 32, 256) - (num_layers, batch_size, hidden_size)
__init__(num_embeddings, embedding_dim, hidden_size, num_layers=1, seq_model='rnn', device=None, dtype='float32')
Initialize the LanguageModel.
Parameters:
-
num_embeddings(int) –The size of the vocabulary (number of unique tokens).
-
embedding_dim(int) –The dimensionality of the embedding vectors.
-
hidden_size(int) –The number of features in the hidden state of the sequence model.
-
num_layers(int, default:1) –Number of layers in the RNN or LSTM. Default is 1.
-
seq_model(str, default:'rnn') –Type of sequence model to use. Must be either 'rnn' or 'lstm'. Default is 'rnn'.
-
device(Device, default:None) –Device on which to place the model parameters. Default is None (uses default device).
-
dtype(str, default:'float32') –Data type of the model parameters. Default is "float32".
Raises:
-
AssertionError–If seq_model is not 'rnn' or 'lstm'.
forward(x, h=None)
Forward pass of the language model.
Given an input sequence of token indices, returns logits for next-token prediction along with the final hidden state from the sequence model.
Parameters:
-
x(Tensor) –Input tensor of shape (seq_len, batch_size) containing token indices. Each element should be an integer index in the range [0, output_size).
-
h(Tensor or tuple of (Tensor, Tensor) or None, default:None) –Initial hidden state for the sequence model. - For RNN: Tensor of shape (num_layers, batch_size, hidden_size) - For LSTM: Tuple of (h0, c0), each of shape (num_layers, batch_size, hidden_size) - If None, defaults to zeros for RNN or (zeros, zeros) for LSTM.
Returns:
-
logits(Tensor) –Output tensor of shape (seq_len * batch_size, output_size) containing logits for next-token prediction at each position in the sequence.
-
hidden(Tensor or tuple of (Tensor, Tensor)) –Final hidden state from the sequence model. - For RNN: Tensor of shape (num_layers, batch_size, hidden_size) - For LSTM: Tuple of (h_n, c_n), each of shape (num_layers, batch_size, hidden_size)
Notes
The output logits are flattened across the sequence dimension, so each position in the sequence contributes batch_size predictions. This is useful for training with cross-entropy loss where each position is treated as a separate prediction task.