Llama 3.2 90 B Vision Instruct
meta-llamaIntroduction
Llama 3.2-Vision is a collection of multimodal large language models (LLMs) designed for image reasoning and text generation tasks. Developed by Meta, these models are optimized for visual recognition, captioning, and answering questions about images, outperforming many existing open-source and closed multimodal models.
Architecture
Llama 3.2-Vision is built on top of the Llama 3.1 text-only model, utilizing an optimized transformer architecture. It incorporates supervised fine-tuning and reinforcement learning with human feedback to align with human preferences. A separately trained vision adapter integrates with the language model using cross-attention layers to handle image recognition tasks.
Training
The models were pretrained on 6 billion image-text pairs, with additional instruction tuning using publicly available datasets and synthetically generated examples. The training process involved 2.02 million GPU hours on Meta's custom-built GPU infrastructure, with a focus on reducing greenhouse gas emissions.
Guide: Running Locally
To run Llama 3.2-Vision locally, you will need the transformers
library version 4.45.0 or later. Here's a basic setup:
-
Install Transformers:
pip install --upgrade transformers
-
Load the Model:
import torch from transformers import MllamaForConditionalGeneration, AutoProcessor model_id = "meta-llama/Llama-3.2-90B-Vision-Instruct" model = MllamaForConditionalGeneration.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto", ) processor = AutoProcessor.from_pretrained(model_id)
-
Prepare Input and Run Inference:
Use the processor to prepare input data (image and text) and generate output using the model.
For higher performance, consider using cloud GPUs such as NVIDIA's H100, available through cloud providers.
License
Llama 3.2 is governed by the Llama 3.2 Community License. The license grants a non-exclusive, worldwide, non-transferable, and royalty-free limited license to use, reproduce, distribute, and modify the Llama Materials. Redistribution must include the license agreement and proper attribution. Compliance with applicable laws and regulations is mandatory.