ross qwen2 7b
HaochenWangRoss-QWEN2-7B Model Documentation
Introduction
Ross is an open-source multimodal chatbot developed by fine-tuning the Qwen2/Vicuna model. It is designed to follow multimodal instructions using an auto-regressive approach based on the transformer architecture. Ross includes an image reconstruction objective to enhance its multimodal comprehension capabilities.
Architecture
Ross leverages the Qwen2-7B-Instruct and google/siglip-so400m-patch14-384 base models. Its architecture incorporates a transformer framework, which facilitates robust language understanding and image processing capabilities.
Training
Training data for Ross includes datasets from lmms-lab/LLaVA-OneVision-Data and nyu-visionx/Cambrian-Alignment. Additional packages are required for training, which are installable via pip.
Guide: Running Locally
-
Clone the Repository
Clone the Ross repository from GitHub and navigate to the LLaVA folder:git clone https://github.com/Haochen-Wang409/ross.git cd ross
-
Set Up Environment
Create a new Conda environment and install required packages:conda create -n ross python=3.10 -y conda activate ross pip install --upgrade pip # Enable PEP 660 support pip install -e .
-
Install Training Packages
For training purposes, install additional packages:pip install -e ".[train]" pip install flash-attn --no-build-isolation
-
Usage
Import necessary modules and load the pretrained model:import torch from PIL import Image from ross.model.builder import load_pretrained_model from ross.mm_utils import get_model_name_from_path, process_images, tokenizer_image_token from ross.eval.run_llava import eval_model model_path = "HaochenWang/ross-qwen2-7b" tokenizer, model, image_processor, context_len = load_pretrained_model( model_path=model_path, model_base=None, model_name=get_model_name_from_path(model_path) ) model.cuda() model.eval() image = Image.open("...") prompt = "..." images_tensor = process_images( images, image_processor, model.config, ).cuda() input_ids = tokenizer_image_token( prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors="pt", ).unsqueeze(0).cuda() with torch.inference_mode(): output_ids = model.generate( input_ids, images=images_tensor, do_sample=True, temperature=0.8, top_p=0.7, top_k=20, num_beams=5, max_new_tokens=512, use_cache=True, ) outputs = tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0].strip() print(outputs)
Cloud GPUs
For optimal performance, consider using cloud-based GPU services such as AWS EC2, Google Cloud Platform, or Azure for intensive computation tasks.
License
Ross is distributed under the Apache-2.0 license, which allows for open-source usage and modification.