llama 3.2 M E D I T 3 B o1
mkurmanIntroduction
The mkurman/llama-3.2-MEDIT-3B-o1
is a fine-tuned variant of the Llama 3.2 3B Instruct model, designed for reasoning tasks. It utilizes a unique tagging system for chain-of-thought text generation, focusing on deterministic outputs.
Architecture
- Model Name:
mkurman/llama-3.2-MEDIT-3B-o1
- Type: Small Language Model (SLM)
- Base Model: MedIT Solutions Llama 3.2 3B Instruct
- Parameters: 3 billion
- License: llama3.2
Intended Use Cases
- General question answering
- Instruction-based generation
- Reasoning and chain-of-thought exploration
Not Recommended For
- Sensitive medical diagnoses without expert verification
- Highly domain-specific or regulated fields outside the model's training scope
Training
The model incorporates <Thought> and <Output> tags for structured text generation. Fine-tuning was performed to enhance exact matching for reasoning tasks, recommending settings like do_sample=False
or temperature=0.0
for consistent output.
Guide: Running Locally
In Ollama or LM Studio
- Load the GGUF File: Follow the specific instructions from Ollama or LM Studio to set up the model.
- Run the Model: Use the CLI command:
ollama run hf.co/mkurman/llama-3.2-MEDIT-3B-o1
- Set Stop Sequences: Ensure stop sequences are set to
</Output>
.
In a Jupyter Notebook or Python Script (Transformers)
- Load Tokenizer and Model:
from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("mkurman/llama-3.2-MEDIT-3B-o1") model = AutoModelForCausalLM.from_pretrained("mkurman/llama-3.2-MEDIT-3B-o1")
- Define and Encode Prompt:
prompt = [{'role': 'user', 'content': 'Write a short instagram post about hypertension in children. Finish with 3 hashtags'}] input_ids = tokenizer(tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True) + '<Thought>\n\n', return_tensors='pt')
- Generate Response:
output = model.generate(input_ids=input_ids, max_new_tokens=256, do_sample=False, temperature=0.0)
- Decode Output:
decoded_output = tokenizer.decode(output[0], skip_special_tokens=True) print(decoded_output)
Suggested Cloud GPUs
To enhance performance and manage computational demands, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.
License
Refer to the Llama 3.2 Community License Agreement for details on usage rights. For citation, use the following format:
@misc{mkurman2025llama3medit3bo1,
title={{mkurman/llama-3.2-MEDIT-3B-o1}: A fine-tuned Llama 3.2 3B Instruct model for reasoning tasks},
author={Kurman, Mariusz},
year={2025},
howpublished={\url{https://huggingface.co/mkurman/llama-3.2-MEDIT-3B-o1}}
}