Nemotron 4 Mini Hindi 4 B Instruct
nvidiaIntroduction
Nemotron-4-Mini-Hindi-4B-Instruct is a small language model developed by NVIDIA, designed to generate responses to queries grounded in the Indian context, supporting Hindi, English, and Hinglish. It is an aligned version of the Nemotron-4-Mini-Hindi-4B-Base model. The model is available for commercial use and supports a context length of 4,096 tokens. For more detailed information, refer to the arXiv paper.
Architecture
The model uses a transformer decoder architecture with an embedding size of 3072, 32 attention heads, and an MLP intermediate dimension of 9216. It implements Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE).
Training
Nemotron-4-Mini-Hindi-4B-Instruct is fine-tuned from the Nemotron-4-Mini-Hindi-4B-Base model using a mix of real and synthetic alignment corpus. It underwent extensive safety evaluation and was trained on a dataset that may contain biases, which can affect its outputs.
Guide: Running Locally
- Environment Setup: Ensure you have Python and the Transformers library installed.
- Load the Model:
from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("nvidia/Nemotron-4-Mini-Hindi-4B-Instruct") model = AutoModelForCausalLM.from_pretrained("nvidia/Nemotron-4-Mini-Hindi-4B-Instruct")
- Generate Text:
messages = [{"role": "user", "content": "भारत की संस्कृति के बारे में बताएं।"}] tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt") outputs = model.generate(tokenized_chat, max_new_tokens=128) print(tokenizer.decode(outputs[0]))
- Using a Pipeline:
from transformers import pipeline pipe = pipeline("text-generation", model="nvidia/Nemotron-4-Mini-Hindi-4B-Instruct", max_new_tokens=128) pipe.tokenizer = tokenizer pipe(messages)
- Suggested Cloud GPUs: Consider using NVIDIA A100 for optimal performance.
License
Nemotron-4-Mini-Hindi-4B-Instruct is released under the NVIDIA Open Model License Agreement.