hydra
goombalabIntroduction
Hydra is a model designed for bidirectional sequence processing using generalized matrix mixers. It is implemented in Python and leverages PyTorch for computation. The project is available under the Apache-2.0 license and utilizes datasets such as allenai/c4.
Architecture
Hydra employs a quasiseparable matrix mixer for bidirectional sequence processing, which is implemented in the hydra.py
module. The model involves several parameters, including model dimension (d_model
), state expansion (d_state
), convolution width (d_conv
), and block expansion (expand
). Additionally, a matrix mixer framework is available, allowing for various configurations like 'dense', 'toeplitz', 'vandermonde', etc.
Training
Hydra supports BERT training, based on MosaicBERT and M2 codebases. Training specifics include pretraining and finetuning configurations, which are provided in YAML files. Example commands illustrate how to pretrain on datasets like C4 and finetune using GLUE.
Guide: Running Locally
- Installation: Install the Mamba package using
pip install mamba-ssm
. For BERT training, additional packages are required (pip install -r requirements.txt
). - Basic Usage:
import torch from .hydra import Hydra batch, length, dim = 2, 64, 16 x = torch.randn(batch, length, dim).to("cuda") model = Hydra(d_model=dim, d_state=64, d_conv=7, expand=2).to("cuda") y = model(x) assert y.shape == x.shape
- Training Commands:
- Pretrain on C4 using a single GPU:
python main.py yamls/pretrain/hydra.yaml
- Pretrain on C4 using 8 GPUs:
composer -n 8 main.py yamls/pretrain/hydra.yaml
- Finetune on GLUE:
python glue.py yamls/finetune/hydra.yaml
- Pretrain on C4 using a single GPU:
Cloud GPUs are recommended for efficient training and computation, especially for large datasets or models.
License
Hydra is distributed under the Apache-2.0 license, which allows for both personal and commercial use, modification, and distribution with proper attribution.