Intern V L2_5 4 B A W Q
OpenGVLabIntroduction
InternVL 2.5 represents an advanced series of multimodal large language models (MLLMs), evolving from the InternVL 2.0 architecture. It introduces significant improvements in training and testing methodologies, as well as data quality advancements.
Architecture
InternVL 2.5 maintains the foundational "ViT-MLP-LLM" paradigm used in InternVL 1.5 and 2.0. The model architecture integrates a newly incrementally pre-trained InternViT with pre-trained LLMs like InternLM 2.5 and Qwen 2.5, using a randomly initialized MLP projector. Enhancements include a pixel unshuffle operation reducing visual tokens and a dynamic resolution strategy, with support for multi-image and video data.
Training
InternVL 2.5 series includes models with vision and language components such as InternViT and Qwen. These models are incrementally pre-trained and fine-tuned to enhance performance across various tasks.
Guide: Running Locally
-
Install LMDeploy:
pip install lmdeploy>=0.6.4
-
Run a 'Hello, World' Example:
from lmdeploy import pipeline, TurbomindEngineConfig from lmdeploy.vl import load_image model = 'OpenGVLab/InternVL2_5-4B-AWQ' image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg') pipe = pipeline(model) response = pipe(('describe this image', image)) print(response.text)
-
Multi-Image Inference: Load multiple images and process them through the pipeline to handle complex tasks.
-
Batch Prompts Inference: Use a list structure for prompts to conduct batch processing.
-
Multi-Turn Conversations: Utilize the
pipeline.chat
interface for interactive sessions. -
Service Deployment: Deploy the model using LMDeploy's
api_server
to create RESTful APIs compatible with OpenAI interfaces.lmdeploy serve api_server OpenGVLab/InternVL2_5-4B-AWQ --server-port 23333
For cloud GPUs, consider platforms like AWS, Google Cloud, or Azure for enhanced computational power.
License
This project is licensed under the MIT License. It incorporates components such as the pre-trained Qwen2.5-3B-Instruct, which is licensed under the Apache License 2.0.