llama 3.1 Korean Bllossom Vision 8 B
BllossomIntroduction
The Bllossom team has released Bllossom-Vision, a Korean-English vision-language model based on Llama 3.1. This model serves dual purposes, functioning as both a general language model and a vision-language model. It maintains strong performance in both English and Korean without compromising the capabilities of a traditional language model.
Architecture
Bllossom-Vision supports bilingual functionality and can perform both text generation and image analysis. This model is designed to operate as a language model when no image is provided and as a vision-language model when an image is included. The model's architecture emphasizes maintaining robust language model performance while excelling in vision-language tasks.
Training
The model was developed through collaboration with several institutions:
- Seoultech MLP Lab: Provided pre-training techniques for vision-language and language models.
- Teddysum: Assisted with instruction tuning and RAG technology.
- Euclid Soft: Supplied training data for vision-language tasks.
- AICA: Offered research support and collaboration.
The training process leveraged a comprehensive Korean VQA dataset collected over five years, enhancing the model's capabilities in understanding and generating text related to visual content.
Guide: Running Locally
To run Bllossom-Vision locally, follow these steps:
-
Install Dependencies:
pip install torch transformers==4.44.0
-
Set Up the Model:
Use the following Python code to load and utilize the model:from transformers import LlavaNextForConditionalGeneration, LlavaNextProcessor import torch model = LlavaNextForConditionalGeneration.from_pretrained( 'Bllossom/llama-3.1-Korean-Bllossom-Vision-8B', torch_dtype=torch.bfloat16, device_map='auto' ) processor = LlavaNextProcessor.from_pretrained('Bllossom/llama-3.1-Korean-Bllossom-Vision-8B')
-
Run Inference:
- Without Image: Use text prompts to interact with the model.
- With Image: Load an image using PIL and process it through the model.
Consider using cloud GPUs such as AWS EC2 instances or Google Cloud Compute Engine for enhanced performance and scalability.
License
The Bllossom-Vision model is licensed under the LLAMA 3.1 license, permitting commercial use. This allows businesses and developers to integrate the model into commercial applications.