Aria U I base

Aria-UI

Introduction

Aria-UI is a model designed for interpreting and executing grounding instructions within graphical user interfaces (GUIs). It excels in various scenarios due to its versatile grounding instruction understanding and context-aware grounding capabilities. The model is notable for its lightweight and fast processing, featuring a mixture-of-expert architecture that enhances its performance across diverse input types and configurations.

Architecture

Aria-UI is built with a mixture-of-expert model architecture, activating 3.9 billion parameters per token. This design allows it to efficiently handle GUI inputs of varying sizes and aspect ratios, supporting ultra-resolution formats. The model's architecture contributes to its state-of-the-art performance on several benchmarks, including a 1st place victory on AndroidWorld.

Training

Aria-UI leverages various datasets, including the Aria-UI dataset, to train its model. This training enables it to understand and execute instructions across a range of GUI scenarios. The model's training data and architecture contribute to its superior performance in both offline and online benchmarks, achieving high task success rates.

Guide: Running Locally

To run Aria-UI locally, follow these steps:

  1. Installation
    Install necessary dependencies using pip:

    pip install transformers==4.45.0 accelerate==0.34.1 sentencepiece==0.2.0 torchvision requests torch Pillow
    pip install flash-attn --no-build-isolation
    pip install grouped_gemm==0.1.6
    
  2. Inference with vLLM (Recommended)
    Ensure vLLM is installed:

    pip install https://vllm-wheels.s3.us-west-2.amazonaws.com/nightly/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
    

    Use the provided Python script to perform inference with vLLM. This approach is recommended for optimal performance.

  3. Inference with Transformers (Not Recommended)
    Alternatively, you can use the Transformers library, though it's not recommended for optimal performance. The script provided uses the AutoModelForCausalLM and AutoProcessor classes for inference.

Cloud GPUs
For enhanced performance and to handle large datasets or models, consider using cloud-based GPU services like AWS, Google Cloud, or Azure.

License

Aria-UI is released under the Apache-2.0 license, allowing for both commercial and non-commercial use, modification, and distribution.

More Related APIs