Bio Medical Multi Modal Llama 3 8 B V1

ContactDoctor

Introduction

Bio-Medical-MultiModal-Llama-3-8B-V1 is a specialized large language model designed for applications in the biomedical field. It is fine-tuned from the Llama-3-8B-Instruct model using a custom dataset with over 500,000 entries. This dataset includes both text and image data, combining synthetic and manually curated samples to ensure a comprehensive coverage of biomedical knowledge.

Architecture

  • Model Name: Bio-Medical-MultiModal-Llama-3-8B-V1
  • Base Model: Llama-3-8B-Instruct
  • Parameter Count: 8 billion
  • Dataset Composition: Custom high-quality biomedical text and image dataset

Training

Bio-Medical-MultiModal-Llama-3-8B-V1 was trained using NVIDIA H100 GPUs to handle large-scale data efficiently. The training process utilized MiniCPM for managing multimodal data. The model was evaluated rigorously to ensure robustness and reliability in real-world biomedical applications.

Training Hyperparameters

  • Learning Rate: 0.0002
  • Train Batch Size: 4
  • Eval Batch Size: 4
  • Number of Epochs: 3
  • Optimizer: Adam with betas=(0.9, 0.999)
  • Mixed Precision Training: Native AMP

Framework Versions

  • PEFT: 0.11.0
  • Transformers: 4.40.2
  • PyTorch: 2.1.2
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Guide: Running Locally

To run the model locally, follow these basic steps:

  1. Install Dependencies: Ensure you have the necessary libraries such as PyTorch, Transformers, and PIL for image processing.

  2. Load the Model and Tokenizer:

    import torch
    from PIL import Image
    from transformers import AutoModel, AutoTokenizer, BitsAndBytesConfig
    
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_use_double_quant=True,
        bnb_4bit_compute_dtype=torch.float16,
    )
    
    model = AutoModel.from_pretrained(
        "ContactDoctor/Bio-Medical-MultiModal-Llama-3-8B-V1",
        quantization_config=bnb_config,
        device_map="auto",
        torch_dtype=torch.float16,
        trust_remote_code=True,
        attn_implementation="flash_attention_2",
    )
    
    tokenizer = AutoTokenizer.from_pretrained("ContactDoctor/Bio-Medical-MultiModal-Llama-3-8B-V1", trust_remote_code=True)
    
  3. Prepare Input Data:

    image = Image.open("Path to Your image").convert('RGB')
    question = 'Give the modality, organ, analysis, abnormalities (if any), treatment (if abnormalities are present)?'
    msgs = [{'role': 'user', 'content': [image, question]}]
    
  4. Run the Model:

    res = model.chat(image=image, msgs=msgs, tokenizer=tokenizer, sampling=True, temperature=0.95, stream=True)
    generated_text = ""
    for new_text in res:
        generated_text += new_text
        print(new_text, flush=True, end='')
    
  5. Suggested Cloud GPUs: Consider using cloud services that offer NVIDIA GPUs like the H100 for efficient model execution.

License

The Bio-Medical-MultiModal-Llama-3-8B-V1 model is available under a non-commercial use license. Please review the terms and conditions before using the model.

More Related APIs in Image Text To Text