Simple Stories 125 M LLM Model

Introduction

SimpleStories-125M is a text generation model designed to produce coherent and simple English narratives. It leverages model distillation for efficient performance and is accessible via the Hugging Face Model Hub.

Architecture

The model utilizes the Llama architecture, which is a transformer-based neural network. It incorporates PyTorchModelHubMixin for easy integration with Hugging Face's ecosystem, allowing seamless model loading and usage.

Training

SimpleStories-125M was trained using distilled techniques to optimize model size and performance. The training process is managed through the simple_stories_train repository, which houses the necessary scripts and configurations for replicating the training environment.

Guide: Running Locally

To run SimpleStories-125M locally, follow these steps:

Install Dependencies: Ensure you have Python and PyTorch installed. Use pip to install the Hugging Face Hub library:
```
pip install huggingface-hub torch
```

Load the Model: Use the following Python script to load SimpleStories-125M:

from simple_stories_train.models.llama import Llama, LlamaConfig
from simple_stories_train.models.model_configs import MODEL_CONFIGS_DICT
from huggingface_hub import PyTorchModelHubMixin

class LlamaTransformer(nn.Module, PyTorchModelHubMixin):
    def __init__(self, **config):
        super().__init__()
        self.llama = Llama(LlamaConfig(**config))

    def forward(self, x):
        return self.llama(x)

config = MODEL_CONFIGS_DICT["d12"]
model = LlamaTransformer(**config)
model = model.from_pretrained("lennart-finke/SimpleStories-125M")

Cloud GPU Recommendation: Consider using cloud services such as AWS, Google Cloud, or Azure for access to powerful GPUs, which can enhance inference performance.

License

The model and its associated code are available under licenses specified in the simple_stories_train repository. Ensure compliance with these licenses when using the model.

More Related APIs in Text Generation