O S Genesis 8 B A C

OS-Copilot

Introduction

OS-Genesis is an interaction-driven pipeline designed to synthesize high-quality and diverse GUI agent trajectory data without human supervision. Utilizing reverse task synthesis, it enables effective training of GUI agents to perform exceptionally on dynamic benchmarks like AndroidWorld and WebArena.

Architecture

OS-Genesis-8B-AC is a mobile action model that has been fine-tuned from InternVL2-8B. It is part of the OS-Genesis AC Family Models, which are used to evaluate the AndroidControl Benchmark. The models are based on various architectures, including InternVL2-4B, Qwen2-VL-7B-Instruct, and InternVL2-8B.

Training

The OS-Genesis AC Family Models are trained using a dataset specifically crafted for mobile data. The training data and models are available on Hugging Face, and they leverage the InternVL2 architecture for superior performance.

Guide: Running Locally

To run OS-Genesis-8B-AC locally, follow these steps:

  1. Install Required Libraries:

    • Install the transformers library using pip:
      pip install transformers
      
    • Refer to the InternVL2 documentation for additional dependencies.
  2. Set Up the Model:

    • Load the model using the AutoModel and AutoTokenizer classes from the transformers library:
      from transformers import AutoModel, AutoTokenizer
      path = 'OS-Copilot/OS-Genesis-8B-AC'
      model = AutoModel.from_pretrained(path).eval().cuda()
      tokenizer = AutoTokenizer.from_pretrained(path)
      
  3. Inference:

    • Use the provided example code to preprocess images and run the model:
      pixel_values = load_image('./image.png').to(torch.bfloat16).cuda()
      response, history = model.chat(tokenizer, pixel_values, question, generation_config)
      print(f'User: {question}\nAssistant: {response}')
      
  4. Evaluation:

Cloud GPUs: For enhanced performance, consider using cloud-based GPU services such as AWS, Google Cloud, or Azure.

License

OS-Genesis is licensed under the Apache-2.0 license, permitting wide usage and modification with proper attribution.

More Related APIs in Image Text To Text