llama 3.2 Korean Bllossom A I C A 5 B LLM Model

Introduction

The Bllossom-AICA-5B is a Korean-English vision-language model based on the LLaMA 3.2 architecture. It is developed by the Bllossom team, incorporating contributions from Seoultech, Teddysum, and Yonsei University. This model offers bidirectional usage as both a general language model and a vision-language model, optimized for Korean language processing, including OCR and interpretation of tables and graphs.

Architecture

The Bllossom-AICA-5B model extends the LLaMA 3B architecture, facilitating both language and vision-language tasks. When an image is provided, it functions as a vision-language model, while in the absence of an image, it operates as a language model. The model includes features such as significant improvements in language performance based on visual understanding and selective inference of external knowledge.

Training

The training process involved comprehensive use of publicly available Korean LLM pre-training data from Hugging Face, along with AI-Hub, KISTI AI data, and other Korean vision-language datasets. Additionally, custom Korean vision-language instruction tuning data was used to optimize the model.

Guide: Running Locally

Install Dependencies: Ensure you have Python and the transformers library installed.

Load the Model: Use the following code snippet to load the model:

from transformers import MllamaForConditionalGeneration, MllamaProcessor
import torch

model = MllamaForConditionalGeneration.from_pretrained(
  'Bllossom/llama-3.2-Korean-Bllossom-AICA-5B',
  torch_dtype=torch.bfloat16,
  device_map='auto'
)
processor = MllamaProcessor.from_pretrained('Bllossom/llama-3.2-Korean-Bllossom-AICA-5B')

Run Inference: Use the model for text or vision-language tasks as shown in the example code provided in the documentation.
Suggest Cloud GPUs: For optimal performance, it is recommended to use cloud GPU services such as Google Colab, AWS, or Azure.

License

The Bllossom-AICA-5B model is released under the llama3.2 license, allowing for commercial use.

More Related APIs in Image Text To Text