sarvam 1
sarvamaiIntroduction
Sarvam-1 is a 2-billion parameter language model optimized for Indian languages. It excels in 10 Indic languages and provides competitive performance compared to larger models. It is designed for text completion and finetuning on specific tasks.
Architecture
- Hidden Size: 2048
- Intermediate Size: 11,008
- Attention Heads: 16
- Hidden Layers: 28
- Key-Value Heads: 8
- Max Position Embeddings: 8,192
- Activation Function: SwiGLU
- Positional Embeddings: Rotary (RoPE) with theta=10,000
- Training: Grouped-query attention and bfloat16 mixed-precision
Training
- Infrastructure: Yotta's Shakti cluster
- Hardware: 1,024 GPUs
- Duration: 5 days
- Framework: NVIDIA NeMo
Guide: Running Locally
-
Install Transformers Library:
pip install transformers
-
Load Model and Tokenizer:
from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("sarvamai/sarvam-1") tokenizer = AutoTokenizer.from_pretrained("sarvamai/sarvam-1")
-
Generate Text:
text = "कर्नाटक की राजधानी है:" inputs = tokenizer(text, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=5) result = tokenizer.decode(outputs[0])
Cloud GPU Suggestion
For optimal performance, consider using cloud services like AWS, Azure, or GCP with GPU instances.
License
Sarvam is released under a non-commercial license. For more details, refer to the LICENSE file.