Florence 2 Flux Large
gokaygokayIntroduction
Florence-2-Flux-Large is a model designed for image-text-to-text transformation tasks. It utilizes the transformers
library and supports processing in English. The model is particularly suited for tasks that combine text generation with art and custom code applications.
Architecture
The model is built on the base model microsoft/Florence-2-large
and leverages the capabilities of the transformers
library. It employs a causal language model architecture, enabling it to generate text based on image inputs and textual prompts.
Training
The model uses the kadirnar/fluxdev_controlnet_16k
dataset for training, which facilitates its proficiency in handling tasks that require detailed image-to-text transformation.
Guide: Running Locally
To run Florence-2-Flux-Large locally, follow these steps:
-
Install Dependencies:
pip install -q datasets flash_attn timm einops
-
Set Up Model and Processor:
from transformers import AutoModelForCausalLM, AutoProcessor import torch device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = AutoModelForCausalLM.from_pretrained("gokaygokay/Florence-2-Flux-Large", trust_remote_code=True).to(device).eval() processor = AutoProcessor.from_pretrained("gokaygokay/Florence-2-Flux-Large", trust_remote_code=True)
-
Run Example:
from PIL import Image import requests def run_example(task_prompt, text_input, image): prompt = task_prompt + text_input if image.mode != "RGB": image = image.convert("RGB") inputs = processor(text=prompt, images=image, return_tensors="pt").to(device) generated_ids = model.generate( input_ids=inputs["input_ids"], pixel_values=inputs["pixel_values"], max_new_tokens=1024, num_beams=3, repetition_penalty=1.10, ) generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0] parsed_answer = processor.post_process_generation(generated_text, task=task_prompt, image_size=(image.width, image.height)) return parsed_answer url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true" image = Image.open(requests.get(url, stream=True).raw) answer = run_example("<DESCRIPTION>", "Describe this image in great detail.", image) final_answer = answer["<DESCRIPTION>"] print(final_answer)
-
Consider Using Cloud GPUs: For optimal performance, especially with large models, consider using cloud GPU services like AWS, Google Cloud, or Azure.
License
The Florence-2-Flux-Large model is distributed under the Apache-2.0 license. This allows for both personal and commercial use, modifications, and distribution of the software, provided that the original license terms are met.