min G R U L M base
suayptalhaIntroduction
The minGRULM-base is an experimental model integrated with Hugging Face, based on the paper "Were RNNs All We Needed?". It leverages a GPT-2 tokenizer and is trained on the roneneldan/TinyStories dataset. It requires additional training before use.
Architecture
The model uses the minGRU architecture, a variant of RNN, and is implemented in PyTorch. It is designed for text generation tasks and integrates with Hugging Face's Transformers library.
Training
For training, the model uses an AdamW optimizer with a linear scheduler. The code snippet provided outlines the training loop, which handles device allocation, data loading, and optimization. The model is trained for a specified number of epochs, and both the model and tokenizer are saved post-training.
def train_model(model, tokenizer, train_data, output_dir, epochs=3, batch_size=16, learning_rate=5e-5, block_size=128):
# Training setup and execution
Guide: Running Locally
- Environment Setup: Ensure you have Python and PyTorch installed. Install the minGRU-pytorch library with
pip install minGRU-pytorch
. - Download and Prepare Data: Use the roneneldan/TinyStories dataset.
- Model Training: Use the provided training script to fine-tune the model locally.
- Hardware Requirements: For efficient training, consider using cloud GPUs such as those available from AWS, Google Cloud, or Azure.
License
The minGRULM-base model is licensed under the Apache-2.0 License.