Janus 1.3 B O N N X
onnx-communityIntroduction
The Janus-1.3B-ONNX model by the ONNX Community is a versatile multi-modal transformer model designed for tasks such as text-to-image, image-to-text, and image-text-to-text transformations. It is optimized with ONNX weights for compatibility with the Transformers.js library.
Architecture
The model is based on the deepseek-ai/Janus-1.3B
architecture and uses ONNX weights to enable efficient execution across different platforms, leveraging the capabilities of the Transformers.js library. It supports multi-modal inputs, allowing for seamless transitions between text and image modalities.
Training
Details about the specific training process are not provided. However, the model is part of a suite of multi-modal models, indicating it has been trained on diverse datasets to handle any-to-any transformations effectively.
Guide: Running Locally
To run Janus-1.3B-ONNX locally, follow these steps:
-
Install Transformers.js:
Use npm to install the library:npm i @huggingface/transformers
-
Load the Model and Processor:
import { AutoProcessor, MultiModalityCausalLM } from "@huggingface/transformers"; const model_id = "onnx-community/Janus-1.3B-ONNX"; const processor = await AutoProcessor.from_pretrained(model_id); const model = await MultiModalityCausalLM.from_pretrained(model_id);
-
Prepare Inputs and Generate Outputs: For image+text to text conversion, prepare your conversation object and use the processor to generate inputs. Use the model to generate and decode outputs.
-
Generate Images: Follow the same setup to input text and generate images using the model's image generation capabilities.
-
Cloud GPU Suggestion:
Consider using cloud GPU services like AWS, GCP, or Azure for improved performance, especially for intensive tasks such as image generation.
License
The model is distributed under the "other" license category, indicating specific terms that may differ from standard open-source licenses. Users should review the license details on the Hugging Face model page.