llava llama 3 8b v1_1 imat gguf
city96Introduction
The LLAVA-LLAMA-3-8B-V1_1-IMAT-GGUF model is an image-text-to-text model designed primarily as a text encoder for Hunyuan Video, with potential applications in vision tasks. It is a conversion of the xtuner/llava-llama-3-8b-v1_1-transformers model, utilizing the GGUF library.
Architecture
This model operates as an image-text-to-text encoder, leveraging the GGUF library for its implementation. It has been optimized for use with the imatrix dataset, specifically calibration_datav3.txt, to enhance quantization performance. The model features a different vocabulary size compared to other versions, aligning with the requirements of the Hunyuan Video code.
Training
The model was trained using the imatrix dataset by Bartowski, focusing on quantization under the Q6_K parameter. It was benchmarked against wikitext and no-imatrix scenarios, showing superior performance. The vocabulary size used is consistent with the original transformers repository, which is essential for its intended application.
Guide: Running Locally
- Install Dependencies: Ensure Python and relevant libraries (e.g., GGUF) are installed.
- Download the Model: Obtain the model files from the Hugging Face repository.
- Setup Environment: Configure your environment to integrate with the GGUF library.
- Run Inference: Use the model as a text encoder or for vision tasks as needed.
For optimal performance, consider using cloud GPU services such as AWS, Google Cloud, or Azure.
License
Please refer to the Hugging Face model repository for specific license information regarding the LLAVA-LLAMA-3-8B-V1_1-IMAT-GGUF model.