Janus Flow 1.3 B

deepseek-ai

Introduction

JanusFlow is a framework designed to unify image understanding and generation within a single model. It integrates autoregressive language models with rectified flow, a cutting-edge method in generative modeling. This approach simplifies the training process by incorporating rectified flow into large language models without requiring complex architectural changes.

Architecture

JanusFlow serves as a unified multimodal model, separating visual encoding for understanding and generation. It is based on the DeepSeek-LLM-1.3b-base structure. For multimodal understanding, it employs the SigLIP-L model as the vision encoder supporting 384 x 384 image input. For image generation, it uses rectified flow along with SDXL-VAE to produce images of the same resolution. The available checkpoint is an Exponential Moving Average (EMA) checkpoint completed after pre-training and supervised fine-tuning.

Training

Details regarding the training process are not explicitly outlined in the documentation. However, the model utilizes pre-trained components, such as the SigLIP-L vision encoder and SDXL-VAE for image generation, indicating a robust pre-training phase followed by fine-tuning.

Guide: Running Locally

  1. Clone the Repository: Access the JanusFlow code on GitHub.
    git clone https://github.com/deepseek-ai/Janus
    cd Janus
    
  2. Install Dependencies: Ensure all necessary libraries and dependencies are installed, particularly focusing on the transformers library.
    pip install -r requirements.txt
    
  3. Download Pre-trained Models: Acquire the pre-trained models/checkpoints from the Hugging Face repository if not included.
  4. Run Inference: Use provided scripts or notebook examples to test the model locally. Modify any input paths as needed.
  5. Cloud GPUs: For optimal performance, especially during inference, consider using cloud services like AWS, Google Cloud, or Azure offering GPU instances.

License

The JanusFlow code is released under the MIT License. Usage of the JanusFlow models is governed by the DeepSeek Model License. Refer to the respective license files for detailed terms.

More Related APIs in Any To Any