llama 3.2 M E D I T 3 B o1 LLM Model

Introduction

The mkurman/llama-3.2-MEDIT-3B-o1 is a fine-tuned variant of the Llama 3.2 3B Instruct model, designed for reasoning tasks. It utilizes a unique tagging system for chain-of-thought text generation, focusing on deterministic outputs.

Architecture

Model Name: mkurman/llama-3.2-MEDIT-3B-o1
Type: Small Language Model (SLM)
Base Model: MedIT Solutions Llama 3.2 3B Instruct
Parameters: 3 billion
License: llama3.2

Intended Use Cases

General question answering
Instruction-based generation
Reasoning and chain-of-thought exploration

Not Recommended For

Sensitive medical diagnoses without expert verification
Highly domain-specific or regulated fields outside the model's training scope

Training

The model incorporates <Thought> and <Output> tags for structured text generation. Fine-tuning was performed to enhance exact matching for reasoning tasks, recommending settings like do_sample=False or temperature=0.0 for consistent output.

Guide: Running Locally

In Ollama or LM Studio

Load the GGUF File: Follow the specific instructions from Ollama or LM Studio to set up the model.

Run the Model: Use the CLI command:

ollama run hf.co/mkurman/llama-3.2-MEDIT-3B-o1

Set Stop Sequences: Ensure stop sequences are set to </Output>.

In a Jupyter Notebook or Python Script (Transformers)

Load Tokenizer and Model:

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("mkurman/llama-3.2-MEDIT-3B-o1")
model = AutoModelForCausalLM.from_pretrained("mkurman/llama-3.2-MEDIT-3B-o1")

Define and Encode Prompt:

prompt = [{'role': 'user', 'content': 'Write a short instagram post about hypertension in children. Finish with 3 hashtags'}]
input_ids = tokenizer(tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True) + '<Thought>\n\n', return_tensors='pt')

Generate Response:

output = model.generate(input_ids=input_ids, max_new_tokens=256, do_sample=False, temperature=0.0)

Decode Output:

decoded_output = tokenizer.decode(output[0], skip_special_tokens=True)
print(decoded_output)

Suggested Cloud GPUs

To enhance performance and manage computational demands, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.

License

Refer to the Llama 3.2 Community License Agreement for details on usage rights. For citation, use the following format:

@misc{mkurman2025llama3medit3bo1,
  title={{mkurman/llama-3.2-MEDIT-3B-o1}: A fine-tuned Llama 3.2 3B Instruct model for reasoning tasks},
  author={Kurman, Mariusz},
  year={2025},
  howpublished={\url{https://huggingface.co/mkurman/llama-3.2-MEDIT-3B-o1}}
}

More Related APIs in Text Generation