Introduction

Hydra is a model designed for bidirectional sequence processing using generalized matrix mixers. It is implemented in Python and leverages PyTorch for computation. The project is available under the Apache-2.0 license and utilizes datasets such as allenai/c4.

Architecture

Hydra employs a quasiseparable matrix mixer for bidirectional sequence processing, which is implemented in the hydra.py module. The model involves several parameters, including model dimension (d_model), state expansion (d_state), convolution width (d_conv), and block expansion (expand). Additionally, a matrix mixer framework is available, allowing for various configurations like 'dense', 'toeplitz', 'vandermonde', etc.

Training

Hydra supports BERT training, based on MosaicBERT and M2 codebases. Training specifics include pretraining and finetuning configurations, which are provided in YAML files. Example commands illustrate how to pretrain on datasets like C4 and finetune using GLUE.

Guide: Running Locally

  1. Installation: Install the Mamba package using pip install mamba-ssm. For BERT training, additional packages are required (pip install -r requirements.txt).
  2. Basic Usage:
    import torch
    from .hydra import Hydra
    
    batch, length, dim = 2, 64, 16
    x = torch.randn(batch, length, dim).to("cuda")
    model = Hydra(d_model=dim, d_state=64, d_conv=7, expand=2).to("cuda")
    y = model(x)
    assert y.shape == x.shape
    
  3. Training Commands:
    • Pretrain on C4 using a single GPU:
      python main.py yamls/pretrain/hydra.yaml
      
    • Pretrain on C4 using 8 GPUs:
      composer -n 8 main.py yamls/pretrain/hydra.yaml
      
    • Finetune on GLUE:
      python glue.py yamls/finetune/hydra.yaml
      

Cloud GPUs are recommended for efficient training and computation, especially for large datasets or models.

License

Hydra is distributed under the Apache-2.0 license, which allows for both personal and commercial use, modification, and distribution with proper attribution.

More Related APIs