Aria Chat
rhymes-aiIntroduction
Aria-Chat is a multimodal model optimized for open-ended and multi-round dialogs, aiming to provide a seamless open-source chat experience. It has enhanced reliability in generating long outputs and improved multi-lingual capabilities.
Architecture
Aria-Chat utilizes a total of 25.3 billion parameters and supports multimodal conversations. It is capable of handling both text and images in conversations, making it suitable for diverse applications.
Training
The model was evaluated on WildVision-Bench, showing significant improvements in real-world applications compared to other benchmarks. The focus is on optimizing for actual use cases rather than solely on benchmark scores.
Guide: Running Locally
Installation
To run Aria-Chat locally, you need to install the following Python packages:
pip install transformers==4.45.0 accelerate==0.34.1 sentencepiece==0.2.0 torchvision requests torch Pillow
pip install flash-attn --no-build-isolation
pip install grouped_gemm==0.1.6
Inference
You can load the model using one A100 (80GB) GPU with bfloat16 precision. Here is a basic usage example:
import requests
import torch
from PIL import Image
from transformers import AutoModelForCausalLM, AutoProcessor
model_id_or_path = "rhymes-ai/Aria-Chat"
model = AutoModelForCausalLM.from_pretrained(model_id_or_path, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True)
processor = AutoProcessor.from_pretrained(model_id_or_path, trust_remote_code=True)
image_path = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png"
image = Image.open(requests.get(image_path, stream=True).raw)
messages = [
{
"role": "user",
"content": [
{"text": None, "type": "image"},
{"text": "what is the image?", "type": "text"},
],
}
]
text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(text=text, images=image, return_tensors="pt")
inputs["pixel_values"] = inputs["pixel_values"].to(model.dtype)
inputs = {k: v.to(model.device) for k, v in inputs.items()}
with torch.inference_mode(), torch.cuda.amp.autocast(dtype=torch.bfloat16):
output = model.generate(
**inputs,
max_new_tokens=500,
stop_strings=["<|im_end|>"],
tokenizer=processor.tokenizer,
do_sample=True,
temperature=0.9,
)
output_ids = output[0][inputs["input_ids"].shape[1]:]
result = processor.decode(output_ids, skip_special_tokens=True)
print(result)
Cloud GPUs
For optimal performance, consider using a cloud service that offers A100 GPUs, which are well-suited for handling the model's requirements.
License
The Aria-Chat model is licensed under the Apache 2.0 License, allowing for wide usage and adaptation with proper attribution.