Aria U I base
Aria-UIIntroduction
Aria-UI is a model designed for interpreting and executing grounding instructions within graphical user interfaces (GUIs). It excels in various scenarios due to its versatile grounding instruction understanding and context-aware grounding capabilities. The model is notable for its lightweight and fast processing, featuring a mixture-of-expert architecture that enhances its performance across diverse input types and configurations.
Architecture
Aria-UI is built with a mixture-of-expert model architecture, activating 3.9 billion parameters per token. This design allows it to efficiently handle GUI inputs of varying sizes and aspect ratios, supporting ultra-resolution formats. The model's architecture contributes to its state-of-the-art performance on several benchmarks, including a 1st place victory on AndroidWorld.
Training
Aria-UI leverages various datasets, including the Aria-UI dataset, to train its model. This training enables it to understand and execute instructions across a range of GUI scenarios. The model's training data and architecture contribute to its superior performance in both offline and online benchmarks, achieving high task success rates.
Guide: Running Locally
To run Aria-UI locally, follow these steps:
-
Installation
Install necessary dependencies using pip:pip install transformers==4.45.0 accelerate==0.34.1 sentencepiece==0.2.0 torchvision requests torch Pillow pip install flash-attn --no-build-isolation pip install grouped_gemm==0.1.6
-
Inference with vLLM (Recommended)
Ensure vLLM is installed:pip install https://vllm-wheels.s3.us-west-2.amazonaws.com/nightly/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
Use the provided Python script to perform inference with vLLM. This approach is recommended for optimal performance.
-
Inference with Transformers (Not Recommended)
Alternatively, you can use the Transformers library, though it's not recommended for optimal performance. The script provided uses theAutoModelForCausalLM
andAutoProcessor
classes for inference.
Cloud GPUs
For enhanced performance and to handle large datasets or models, consider using cloud-based GPU services like AWS, Google Cloud, or Azure.
License
Aria-UI is released under the Apache-2.0 license, allowing for both commercial and non-commercial use, modification, and distribution.