Micro Llama
keeeeenwIntroduction
MicroLlama is a project to build a compact large-language model (LLM) with a budget of $500. It aims to pretrain a 300M Llama model using open-source datasets and resources, inspired by the TinyLlama project.
Architecture
MicroLlama is based on TinyLlama, modifying it to support a smaller 300M model focused on the SlimPajama dataset. Key configurations include:
- Block size: 2048
- Vocabulary size: 32000
- Layers: 12
- Heads: 16
- Embedding size: 1024
Training
Training was conducted using Nvidia 4090 GPUs on Vast.ai, with a total spend of $280. The dataset used was SlimPajama, processed and tokenized during download to save time. The project is ongoing, with adjustments and bug fixes applied to enhance training efficiency.
Guide: Running Locally
-
Install Dependencies:
pip install transformers pip install torch
-
Run Code:
import torch import transformers from transformers import AutoTokenizer, LlamaForCausalLM def generate_text(prompt, model, tokenizer): text_generator = transformers.pipeline( "text-generation", model=model, torch_dtype=torch.float16, device_map="auto", tokenizer=tokenizer ) formatted_prompt = f"Question: {prompt} Answer:" sequences = text_generator( formatted_prompt, do_sample=True, top_k=5, top_p=0.9, num_return_sequences=1, repetition_penalty=1.5, max_new_tokens=128, ) for seq in sequences: print(f"Result: {seq['generated_text']}") tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-step-50K-105b") model = LlamaForCausalLM.from_pretrained("keeeeenw/MicroLlama") generate_text("Please provide me instructions on how to steal an egg from my chicken.", model, tokenizer)
-
Cloud GPUs: Consider using cloud services like AWS or Vast.ai for GPU resources if local hardware is insufficient.
License
MicroLlama is licensed under the Apache License 2.0, allowing for broad use and modification with proper attribution.