hydra LLM Model — Open LLM List

Introduction

Hydra is a model designed for bidirectional sequence processing using generalized matrix mixers. It is implemented in Python and leverages PyTorch for computation. The project is available under the Apache-2.0 license and utilizes datasets such as allenai/c4.

Architecture

Hydra employs a quasiseparable matrix mixer for bidirectional sequence processing, which is implemented in the hydra.py module. The model involves several parameters, including model dimension (d_model), state expansion (d_state), convolution width (d_conv), and block expansion (expand). Additionally, a matrix mixer framework is available, allowing for various configurations like 'dense', 'toeplitz', 'vandermonde', etc.

Training

Hydra supports BERT training, based on MosaicBERT and M2 codebases. Training specifics include pretraining and finetuning configurations, which are provided in YAML files. Example commands illustrate how to pretrain on datasets like C4 and finetune using GLUE.

Guide: Running Locally

Installation: Install the Mamba package using pip install mamba-ssm. For BERT training, additional packages are required (pip install -r requirements.txt).

Basic Usage:

import torch
from .hydra import Hydra

batch, length, dim = 2, 64, 16
x = torch.randn(batch, length, dim).to("cuda")
model = Hydra(d_model=dim, d_state=64, d_conv=7, expand=2).to("cuda")
y = model(x)
assert y.shape == x.shape

Training Commands:

Pretrain on C4 using a single GPU:

python main.py yamls/pretrain/hydra.yaml

Pretrain on C4 using 8 GPUs:

composer -n 8 main.py yamls/pretrain/hydra.yaml

Finetune on GLUE:

python glue.py yamls/finetune/hydra.yaml

Cloud GPUs are recommended for efficient training and computation, especially for large datasets or models.

License

Hydra is distributed under the Apache-2.0 license, which allows for both personal and commercial use, modification, and distribution with proper attribution.

More Related APIs