min G R U L M base

suayptalha

Introduction

The minGRULM-base is an experimental model integrated with Hugging Face, based on the paper "Were RNNs All We Needed?". It leverages a GPT-2 tokenizer and is trained on the roneneldan/TinyStories dataset. It requires additional training before use.

Architecture

The model uses the minGRU architecture, a variant of RNN, and is implemented in PyTorch. It is designed for text generation tasks and integrates with Hugging Face's Transformers library.

Training

For training, the model uses an AdamW optimizer with a linear scheduler. The code snippet provided outlines the training loop, which handles device allocation, data loading, and optimization. The model is trained for a specified number of epochs, and both the model and tokenizer are saved post-training.

def train_model(model, tokenizer, train_data, output_dir, epochs=3, batch_size=16, learning_rate=5e-5, block_size=128):
    # Training setup and execution

Guide: Running Locally

  1. Environment Setup: Ensure you have Python and PyTorch installed. Install the minGRU-pytorch library with pip install minGRU-pytorch.
  2. Download and Prepare Data: Use the roneneldan/TinyStories dataset.
  3. Model Training: Use the provided training script to fine-tune the model locally.
  4. Hardware Requirements: For efficient training, consider using cloud GPUs such as those available from AWS, Google Cloud, or Azure.

License

The minGRULM-base model is licensed under the Apache-2.0 License.

More Related APIs in Text Generation