Bio Medical Multi Modal Llama 3 8 B V1 LLM Model

Introduction

Bio-Medical-MultiModal-Llama-3-8B-V1 is a specialized large language model designed for applications in the biomedical field. It is fine-tuned from the Llama-3-8B-Instruct model using a custom dataset with over 500,000 entries. This dataset includes both text and image data, combining synthetic and manually curated samples to ensure a comprehensive coverage of biomedical knowledge.

Architecture

Model Name: Bio-Medical-MultiModal-Llama-3-8B-V1
Base Model: Llama-3-8B-Instruct
Parameter Count: 8 billion
Dataset Composition: Custom high-quality biomedical text and image dataset

Training

Bio-Medical-MultiModal-Llama-3-8B-V1 was trained using NVIDIA H100 GPUs to handle large-scale data efficiently. The training process utilized MiniCPM for managing multimodal data. The model was evaluated rigorously to ensure robustness and reliability in real-world biomedical applications.

Training Hyperparameters

Learning Rate: 0.0002
Train Batch Size: 4
Eval Batch Size: 4
Number of Epochs: 3
Optimizer: Adam with betas=(0.9, 0.999)
Mixed Precision Training: Native AMP

Framework Versions

PEFT: 0.11.0
Transformers: 4.40.2
PyTorch: 2.1.2
Datasets: 2.19.1
Tokenizers: 0.19.1

Guide: Running Locally

To run the model locally, follow these basic steps:

Install Dependencies: Ensure you have the necessary libraries such as PyTorch, Transformers, and PIL for image processing.

Load the Model and Tokenizer:

import torch
from PIL import Image
from transformers import AutoModel, AutoTokenizer, BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16,
)

model = AutoModel.from_pretrained(
    "ContactDoctor/Bio-Medical-MultiModal-Llama-3-8B-V1",
    quantization_config=bnb_config,
    device_map="auto",
    torch_dtype=torch.float16,
    trust_remote_code=True,
    attn_implementation="flash_attention_2",
)

tokenizer = AutoTokenizer.from_pretrained("ContactDoctor/Bio-Medical-MultiModal-Llama-3-8B-V1", trust_remote_code=True)

Prepare Input Data:

image = Image.open("Path to Your image").convert('RGB')
question = 'Give the modality, organ, analysis, abnormalities (if any), treatment (if abnormalities are present)?'
msgs = [{'role': 'user', 'content': [image, question]}]

Run the Model:

res = model.chat(image=image, msgs=msgs, tokenizer=tokenizer, sampling=True, temperature=0.95, stream=True)
generated_text = ""
for new_text in res:
    generated_text += new_text
    print(new_text, flush=True, end='')

Suggested Cloud GPUs: Consider using cloud services that offer NVIDIA GPUs like the H100 for efficient model execution.

License

The Bio-Medical-MultiModal-Llama-3-8B-V1 model is available under a non-commercial use license. Please review the terms and conditions before using the model.

More Related APIs in Image Text To Text