nova d48w1536 sdxl1024 LLM Model

Introduction

NOVA (D48W1536-SDXL1024) is a text-to-image generation model developed by the Beijing Academy of Artificial Intelligence (BAAI). It is designed to create and modify images based on text prompts using a diffusion model.

Architecture

Model Type: Non-quantized Autoregressive Text-to-Image Generation Model
Model Size: 1.4 billion parameters
Precision: torch.float16 (FP16)
Resolution: 1024x1024
Components: Utilizes a pretrained text encoder (Phi-2) and a VAE image tokenizer (SDXL-VAE).
License: Apache 2.0

Training

NOVA was trained on a subset of web datasets, including LAION-5B and COYO-700M. These datasets contain a variety of content, including adult, violent, and sexual material. The model's training process involves learning from these datasets to generate images based on text inputs.

Guide: Running Locally

Install Required Packages:

pip install diffusers transformers accelerate
pip install git+ssh://git@github.com/baaivision/NOVA.git

Run the Pipeline:

import torch
from diffnext.pipelines import NOVAPipeline

model_id = "BAAI/nova-d48w1536-sdxl1024"
model_args = {"torch_dtype": torch.float16, "trust_remote_code": True}
pipe = NOVAPipeline.from_pretrained(model_id, **model_args)
pipe = pipe.to("cuda")

prompt = "a shiba inu wearing a beret and black turtleneck."
image = pipe(prompt).images[0]
image.save("shiba_inu.jpg")

Hardware Recommendation: For optimal performance, it's recommended to use a cloud GPU service such as AWS, Google Cloud, or Azure.

License

The NOVA model is released under the Apache 2.0 License, allowing for wide usage and modification, provided that the terms of the license are followed.

More Related APIs in Text To Image