M L C D Seg 7 B

DeepGlint-AI

Introduction

MLCD-Seg-7B is a model designed for multi-label cluster discrimination in visual referring expression segmentation tasks. It aims to improve segmentation accuracy across various datasets, such as RefCOCO, RefCOCO+, and RefCOCOg.

Architecture

The model is based on the DeepGlint-AI/MLCD-Embodied-7B architecture, which emphasizes cluster discrimination to enhance visual segmentation performance. It is tailored for tasks involving referring expressions, supporting efficient processing and accurate segmentation.

Training

MLCD-Seg-7B has been evaluated on multiple datasets where it demonstrated superior performance compared to other models like EVF-SAM, GLaMM, VisionLLM v2, and LISA. Results from the evaluation on RefCOCO, RefCOCO+, and RefCOCOg show improved accuracy in segmentation tasks with the model achieving higher scores across various test splits.

Guide: Running Locally

To run MLCD-Seg-7B locally, follow these steps:

  1. Clone the Repository:

    git clone https://github.com/deepglint/unicom/tree/main
    
  2. Navigate to the Directory:

    cd downstream
    
  3. Execute the Evaluation Script:

    bash ./eval/scripts/eval_refcoco.sh
    

For optimal performance, it is recommended to use cloud GPUs such as those offered by AWS, Google Cloud, or Azure, especially for intensive computational tasks.

License

MLCD-Seg-7B is licensed under the Apache-2.0 License, which permits use, distribution, and modification under the terms specified in the license agreement.

More Related APIs