O S Genesis 4 B A C

OS-Copilot

Introduction
OS-Genesis is an interaction-driven pipeline designed to synthesize high-quality and diverse GUI agent trajectory data without human supervision. By utilizing reverse task synthesis, the model enables effective training of GUI agents to perform well on dynamic benchmarks like AndroidWorld and WebArena.

Architecture
OS-Genesis-4B-AC, a mobile action model, is finetuned from InternVL2-4B. The model is part of a family that includes OS-Genesis-7B-AC and OS-Genesis-8B-AC, which are based on Qwen2-VL-7B-Instruct and InternVL2-8B, respectively.

Training
The model uses OS-Genesis-ac-training-data for training, which is crucial for evaluating benchmarks like AndroidControl. This data supports the model's ability to generate GUI agent trajectories effectively.

Guide: Running Locally

  1. Install Dependencies:

    • Install the transformers library using pip install transformers.
    • Refer to the InternVL2 documentation for additional dependencies.
  2. Set Up Model:

    • Load the model using AutoModel.from_pretrained() with the path OS-Copilot/OS-Genesis-4B-AC.
    • Prepare the tokenizer with AutoTokenizer.from_pretrained().
  3. Process Images:

    • Use load_image() function to preprocess images with specified input_size and max_num.
  4. Run Inference:

    • Utilize the model's chat function to generate responses based on provided inputs such as high-level instructions, action history, and accessibility tree.
  5. Suggested Cloud GPUs:

    • Cloud services like AWS, Google Cloud, or Azure offer GPU instances suitable for running the model efficiently.

License
The model is open-source and distributed under the Apache-2.0 license, which allows for widespread usage and modification.

More Related APIs in Image Text To Text