llava med v1.5 mistral 7b

microsoft

Introduction

LLaVA-Med V1.5, using MistralAI/Mistral-7B-Instruct-V0.2, is a large language and vision model specifically adapted for the biomedical domain. It employs a curriculum learning approach to enhance its performance on biomedical question-answering tasks, including visual question answering (VQA) benchmarks like PathVQA and VQA-RAD. The model and its findings are detailed in the paper "LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day" by Chunyuan Li et al.

Architecture

LLaVA-Med V1.5 uses the Mistral-7B-Instruct-V0.2, a large language model structured for vision-language processing. The model was trained using the PMC-15M dataset, a significant collection of figure-caption pairs from biomedical research articles in PubMed Central. This dataset encompasses diverse biomedical image types, including microscopy, radiography, and histology.

Training

The model was trained in April 2024 and is built to support AI researchers in reproducing and expanding upon this work. Its training leverages the PMC-15M dataset to ensure comprehensive coverage of biomedical vision-language tasks. However, it is important to note that the model may carry biases and is limited to English language corpora.

Guide: Running Locally

  1. Clone the Repository: Ensure you have the necessary permissions and clone the LLaVA-Med repository from GitHub.
  2. Install Dependencies: Use a Python environment manager to install required libraries, typically using a requirements.txt file.
  3. Download Model Weights: Access the model weights from the Hugging Face model hub.
  4. Run the Model: Use the provided scripts in the repository for serving and evaluation purposes.

Suggested Cloud GPUs

Running LLaVA-Med V1.5 locally may require significant computational resources. Consider using cloud-based GPU services such as AWS EC2 with NVIDIA GPUs, Google Cloud Platform, or Azure for efficient processing.

License

LLaVA-Med V1.5 is licensed under the Apache 2.0 License. This covers the data, code, and model checkpoints for research purposes only, explicitly excluding clinical use cases or any commercial deployment.

More Related APIs in Image Text To Text