Nemotron 4 Mini Hindi 4 B Instruct LLM Model

Introduction

Nemotron-4-Mini-Hindi-4B-Instruct is a small language model developed by NVIDIA, designed to generate responses to queries grounded in the Indian context, supporting Hindi, English, and Hinglish. It is an aligned version of the Nemotron-4-Mini-Hindi-4B-Base model. The model is available for commercial use and supports a context length of 4,096 tokens. For more detailed information, refer to the arXiv paper.

Architecture

The model uses a transformer decoder architecture with an embedding size of 3072, 32 attention heads, and an MLP intermediate dimension of 9216. It implements Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE).

Training

Nemotron-4-Mini-Hindi-4B-Instruct is fine-tuned from the Nemotron-4-Mini-Hindi-4B-Base model using a mix of real and synthetic alignment corpus. It underwent extensive safety evaluation and was trained on a dataset that may contain biases, which can affect its outputs.

Guide: Running Locally

Environment Setup: Ensure you have Python and the Transformers library installed.

Load the Model:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("nvidia/Nemotron-4-Mini-Hindi-4B-Instruct")
model = AutoModelForCausalLM.from_pretrained("nvidia/Nemotron-4-Mini-Hindi-4B-Instruct")

Generate Text:

messages = [{"role": "user", "content": "भारत की संस्कृति के बारे में बताएं।"}]
tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(tokenized_chat, max_new_tokens=128)
print(tokenizer.decode(outputs[0]))

Using a Pipeline:

from transformers import pipeline

pipe = pipeline("text-generation", model="nvidia/Nemotron-4-Mini-Hindi-4B-Instruct", max_new_tokens=128)
pipe.tokenizer = tokenizer
pipe(messages)

Suggested Cloud GPUs: Consider using NVIDIA A100 for optimal performance.

License

Nemotron-4-Mini-Hindi-4B-Instruct is released under the NVIDIA Open Model License Agreement.

More Related APIs