O S Atlas Base 7 B
OS-CopilotIntroduction
OS-Atlas is a series of models designed specifically for GUI agents, focusing on tasks that require interaction with graphical user interfaces. It provides models for both GUI grounding and generating single-step actions in GUI agent tasks.
Architecture
The OS-Atlas-Base-7B model is finetuned from Qwen2-VL-7B-Instruct and is part of a suite of models, including OS-Atlas-Pro variants, tailored for GUI tasks. These models are capable of interpreting images of any size, with outputs normalized to a 0-1000 range for coordinates.
Training
The OS-Atlas models are finetuned from the Qwen2-VL-7B-Instruct base model. They are designed to interpret and generate actions based on GUI inputs, making them suitable for applications involving GUI agent tasks.
Guide: Running Locally
To run the OS-Atlas-Base-7B model locally, follow these steps:
-
Install Dependencies:
pip install transformers pip install qwen-vl-utils
-
Download Example Image: Save an example image to your current directory for testing.
-
Inference Code: Use the following code template for inference.
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor from qwen_vl_utils import process_vision_info model = Qwen2VLForConditionalGeneration.from_pretrained( "OS-Copilot/OS-Atlas-Base-7B", torch_dtype="auto", device_map="auto" ) processor = AutoProcessor.from_pretrained("OS-Copilot/OS-Atlas-Base-7B") messages = [ { "role": "user", "content": [ { "type": "image", "image": "./web_6f93090a-81f6-489e-bb35-1a2838b18c01.png", }, {"type": "text", "text": "In this UI screenshot, what is the position of the element corresponding to the command \"switch language of current page\" (with bbox)?"}, ], } ] text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) image_inputs, video_inputs = process_vision_info(messages) inputs = processor(text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt") inputs = inputs.to("cuda") generated_ids = model.generate(**inputs, max_new_tokens=128) output_text = processor.batch_decode(generated_ids, skip_special_tokens=False, clean_up_tokenization_spaces=False) print(output_text)
-
Suggested Cloud GPUs: Utilize cloud GPU services like AWS, Google Cloud, or Azure for enhanced performance and resource availability.
License
The OS-Atlas-Base-7B model is licensed under the Apache 2.0 License, allowing for both personal and commercial use.