DOGE-20M

Introduction

Doge is a research project focused on developing small language models using the Transformer framework. The aim is to create models with fewer cache states and larger knowledge capacity. Doge employs Dynamic Mask Attention for sequence transformation and utilizes either a Multi-Layer Perceptron or Cross Domain Mixture of Experts for state transformation. This model is designed for text input and generation only.

Architecture

  • Dynamic Mask Attention: Utilizes self-attention during training and state space during inference.
  • Cross Domain Mixture of Experts: Capable of inheriting weights from the Multi-Layer Perceptron for enhanced training.

Training

  • Data: Trained on the HuggingFaceTB/smollm-corpus.
  • Training Steps: 8,000
  • Content Length: 4 billion tokens
  • Learning Rate: 8e-3
  • Batch Size: 0.5 million
  • Precision: bfloat16

Evaluation Metrics

  • MMLU: 25.43
  • TriviaQA: 0
  • ARC-E: 36.83
  • ARC-C: 22.53
  • PIQA: 58.38
  • HellaSwag: 27.25
  • OBQA: 25.60
  • Winogrande: 50.20

Guide: Running Locally

  1. Install Dependencies: Ensure you have the transformers library installed.

    pip install transformers
    
  2. Load Model and Tokenizer:

    from transformers import AutoTokenizer, AutoModelForCausalLM
    
    tokenizer = AutoTokenizer.from_pretrained("JingzeShi/Doge-20M")
    model = AutoModelForCausalLM.from_pretrained("JingzeShi/Doge-20M", trust_remote_code=True)
    
  3. Generate Text:

    inputs = tokenizer("Hey how are you doing?", return_tensors="pt")
    out = model.generate(**inputs, max_new_tokens=100)
    print(tokenizer.batch_decode(out))
    

For enhanced performance, consider using cloud GPUs such as those offered by AWS, Google Cloud, or Azure.

License

This project is licensed under the Apache-2.0 License.

More Related APIs in Text Generation