Janus Flow 1.3 B
deepseek-aiIntroduction
JanusFlow is a framework designed to unify image understanding and generation within a single model. It integrates autoregressive language models with rectified flow, a cutting-edge method in generative modeling. This approach simplifies the training process by incorporating rectified flow into large language models without requiring complex architectural changes.
Architecture
JanusFlow serves as a unified multimodal model, separating visual encoding for understanding and generation. It is based on the DeepSeek-LLM-1.3b-base structure. For multimodal understanding, it employs the SigLIP-L model as the vision encoder supporting 384 x 384 image input. For image generation, it uses rectified flow along with SDXL-VAE to produce images of the same resolution. The available checkpoint is an Exponential Moving Average (EMA) checkpoint completed after pre-training and supervised fine-tuning.
Training
Details regarding the training process are not explicitly outlined in the documentation. However, the model utilizes pre-trained components, such as the SigLIP-L vision encoder and SDXL-VAE for image generation, indicating a robust pre-training phase followed by fine-tuning.
Guide: Running Locally
- Clone the Repository: Access the JanusFlow code on GitHub.
git clone https://github.com/deepseek-ai/Janus cd Janus
- Install Dependencies: Ensure all necessary libraries and dependencies are installed, particularly focusing on the
transformers
library.pip install -r requirements.txt
- Download Pre-trained Models: Acquire the pre-trained models/checkpoints from the Hugging Face repository if not included.
- Run Inference: Use provided scripts or notebook examples to test the model locally. Modify any input paths as needed.
- Cloud GPUs: For optimal performance, especially during inference, consider using cloud services like AWS, Google Cloud, or Azure offering GPU instances.
License
The JanusFlow code is released under the MIT License. Usage of the JanusFlow models is governed by the DeepSeek Model License. Refer to the respective license files for detailed terms.