nova d48w1536 sdxl1024
BAAIIntroduction
NOVA (D48W1536-SDXL1024) is a text-to-image generation model developed by the Beijing Academy of Artificial Intelligence (BAAI). It is designed to create and modify images based on text prompts using a diffusion model.
Architecture
- Model Type: Non-quantized Autoregressive Text-to-Image Generation Model
- Model Size: 1.4 billion parameters
- Precision:
torch.float16
(FP16) - Resolution: 1024x1024
- Components: Utilizes a pretrained text encoder (Phi-2) and a VAE image tokenizer (SDXL-VAE).
- License: Apache 2.0
Training
NOVA was trained on a subset of web datasets, including LAION-5B and COYO-700M. These datasets contain a variety of content, including adult, violent, and sexual material. The model's training process involves learning from these datasets to generate images based on text inputs.
Guide: Running Locally
-
Install Required Packages:
pip install diffusers transformers accelerate pip install git+ssh://git@github.com/baaivision/NOVA.git
-
Run the Pipeline:
import torch from diffnext.pipelines import NOVAPipeline model_id = "BAAI/nova-d48w1536-sdxl1024" model_args = {"torch_dtype": torch.float16, "trust_remote_code": True} pipe = NOVAPipeline.from_pretrained(model_id, **model_args) pipe = pipe.to("cuda") prompt = "a shiba inu wearing a beret and black turtleneck." image = pipe(prompt).images[0] image.save("shiba_inu.jpg")
-
Hardware Recommendation: For optimal performance, it's recommended to use a cloud GPU service such as AWS, Google Cloud, or Azure.
License
The NOVA model is released under the Apache 2.0 License, allowing for wide usage and modification, provided that the terms of the license are followed.