M L C D Seg 7 B
DeepGlint-AIIntroduction
MLCD-Seg-7B is a model designed for multi-label cluster discrimination in visual referring expression segmentation tasks. It aims to improve segmentation accuracy across various datasets, such as RefCOCO, RefCOCO+, and RefCOCOg.
Architecture
The model is based on the DeepGlint-AI/MLCD-Embodied-7B architecture, which emphasizes cluster discrimination to enhance visual segmentation performance. It is tailored for tasks involving referring expressions, supporting efficient processing and accurate segmentation.
Training
MLCD-Seg-7B has been evaluated on multiple datasets where it demonstrated superior performance compared to other models like EVF-SAM, GLaMM, VisionLLM v2, and LISA. Results from the evaluation on RefCOCO, RefCOCO+, and RefCOCOg show improved accuracy in segmentation tasks with the model achieving higher scores across various test splits.
Guide: Running Locally
To run MLCD-Seg-7B locally, follow these steps:
-
Clone the Repository:
git clone https://github.com/deepglint/unicom/tree/main
-
Navigate to the Directory:
cd downstream
-
Execute the Evaluation Script:
bash ./eval/scripts/eval_refcoco.sh
For optimal performance, it is recommended to use cloud GPUs such as those offered by AWS, Google Cloud, or Azure, especially for intensive computational tasks.
License
MLCD-Seg-7B is licensed under the Apache-2.0 License, which permits use, distribution, and modification under the terms specified in the license agreement.