O S Genesis 8 B A C
OS-CopilotIntroduction
OS-Genesis is an interaction-driven pipeline designed to synthesize high-quality and diverse GUI agent trajectory data without human supervision. Utilizing reverse task synthesis, it enables effective training of GUI agents to perform exceptionally on dynamic benchmarks like AndroidWorld and WebArena.
Architecture
OS-Genesis-8B-AC is a mobile action model that has been fine-tuned from InternVL2-8B. It is part of the OS-Genesis AC Family Models, which are used to evaluate the AndroidControl Benchmark. The models are based on various architectures, including InternVL2-4B, Qwen2-VL-7B-Instruct, and InternVL2-8B.
Training
The OS-Genesis AC Family Models are trained using a dataset specifically crafted for mobile data. The training data and models are available on Hugging Face, and they leverage the InternVL2 architecture for superior performance.
Guide: Running Locally
To run OS-Genesis-8B-AC locally, follow these steps:
-
Install Required Libraries:
- Install the
transformers
library using pip:pip install transformers
- Refer to the InternVL2 documentation for additional dependencies.
- Install the
-
Set Up the Model:
- Load the model using the
AutoModel
andAutoTokenizer
classes from thetransformers
library:from transformers import AutoModel, AutoTokenizer path = 'OS-Copilot/OS-Genesis-8B-AC' model = AutoModel.from_pretrained(path).eval().cuda() tokenizer = AutoTokenizer.from_pretrained(path)
- Load the model using the
-
Inference:
- Use the provided example code to preprocess images and run the model:
pixel_values = load_image('./image.png').to(torch.bfloat16).cuda() response, history = model.chat(tokenizer, pixel_values, question, generation_config) print(f'User: {question}\nAssistant: {response}')
- Use the provided example code to preprocess images and run the model:
-
Evaluation:
- Refer to the evaluation code for the AndroidControl Benchmark.
Cloud GPUs: For enhanced performance, consider using cloud-based GPU services such as AWS, Google Cloud, or Azure.
License
OS-Genesis is licensed under the Apache-2.0 license, permitting wide usage and modification with proper attribution.