llava 1.5 7b hf
llava-hfIntroduction
The LLAVA model is a chatbot designed for multi-modal instruction-following tasks, leveraging a transformer-based architecture. It is trained on data generated by GPT models and fine-tuned on LLaMA/Vicuna datasets. This model version, LLAVA-1.5-7B, was released in September 2023.
Architecture
LLAVA is an auto-regressive language model utilizing the transformer architecture. It supports multi-image and multi-prompt generation, allowing users to input multiple images in their prompts.
Training
The LLAVA model was fine-tuned on the LLaVA-Instruct-150K dataset using GPT-generated multimodal data. This ensures the model's capability to handle complex, multi-modal conversational tasks.
Guide: Running Locally
- Install Required Packages: Ensure you have
transformers >= 4.35.3
and optionallybitsandbytes
for 4-bit quantization, andflash-attn
for faster processing. - Setup the Environment: Make sure you have access to a CUDA-compatible GPU.
- Load the Model:
- Use the Hugging Face Transformers library to load the model with options for 4-bit quantization or Flash Attention 2 for performance optimization.
- Run a Sample Input:
- Prepare the input using the
AutoProcessor
and format it withapply_chat_template
. - Run the model using the prepared input and process the output.
- Prepare the input using the
- Consider Cloud GPUs: For extensive tasks, consider using cloud-based GPUs from providers like AWS, Google Cloud, or Azure to optimize performance and resource management.
License
Llama 2 is licensed under the LLAMA 2 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved.