O S Genesis 4 B A C
OS-CopilotIntroduction
OS-Genesis is an interaction-driven pipeline designed to synthesize high-quality and diverse GUI agent trajectory data without human supervision. By utilizing reverse task synthesis, the model enables effective training of GUI agents to perform well on dynamic benchmarks like AndroidWorld and WebArena.
Architecture
OS-Genesis-4B-AC, a mobile action model, is finetuned from InternVL2-4B. The model is part of a family that includes OS-Genesis-7B-AC and OS-Genesis-8B-AC, which are based on Qwen2-VL-7B-Instruct and InternVL2-8B, respectively.
Training
The model uses OS-Genesis-ac-training-data for training, which is crucial for evaluating benchmarks like AndroidControl. This data supports the model's ability to generate GUI agent trajectories effectively.
Guide: Running Locally
-
Install Dependencies:
- Install the
transformers
library usingpip install transformers
. - Refer to the InternVL2 documentation for additional dependencies.
- Install the
-
Set Up Model:
- Load the model using
AutoModel.from_pretrained()
with the pathOS-Copilot/OS-Genesis-4B-AC
. - Prepare the tokenizer with
AutoTokenizer.from_pretrained()
.
- Load the model using
-
Process Images:
- Use
load_image()
function to preprocess images with specifiedinput_size
andmax_num
.
- Use
-
Run Inference:
- Utilize the model's
chat
function to generate responses based on provided inputs such as high-level instructions, action history, and accessibility tree.
- Utilize the model's
-
Suggested Cloud GPUs:
- Cloud services like AWS, Google Cloud, or Azure offer GPU instances suitable for running the model efficiently.
License
The model is open-source and distributed under the Apache-2.0 license, which allows for widespread usage and modification.