M N 12 B Celeste V1.9

nothingiisreal

MN-12B-CELESTE-V1.9

Introduction

MN-12B-Celeste-V1.9 is a story writing and roleplaying model built upon Mistral NeMo 12B. It is designed for text generation, particularly focusing on story creation and roleplay scenarios. The model has been optimized for NSFW content, active narration, and utilizes ChatML tokens to avoid end-of-sequence issues.

Architecture

The model is based on the Mistral NeMo 12B architecture. It is trained with a focus on storytelling and roleplay, with a context length of 8K tokens. The model supports dynamic and static quantization methods, including FP8, EXL2, and GGUF quantization options.

Training

The training data consists of a mixture of datasets, including Reddit Writing Prompts, Kalo's Opus Instruct 25K, and cleaned c2 logs. The training process involved filtering to ensure manageable content sizes. System prompts were integrated into the training to enhance roleplay capabilities. Training was conducted using a single H100 SXM GPU over a span of 3 hours, leveraging LoRA+ for improved efficiency.

Guide: Running Locally

  1. Installation: Clone the repository and set up the environment with required libraries, primarily using the Transformers library.
  2. Quantization: Choose between dynamic and static quantization methods as per your computational needs.
  3. Usage: Implement the model with the recommended sampler settings (Stable or Creative) and system prompts for optimal performance.
  4. Experimentation: Fine-tune the model's responses using OOC steering and few-shot learning techniques to align with desired outputs.

Cloud GPUs

For enhanced performance, especially for extensive training and inference tasks, consider using cloud-based GPUs like those from AWS, Google Cloud, or Azure.

License

The model is released under the Apache-2.0 license, allowing for both personal and commercial use, provided the terms of the license are adhered to.

More Related APIs in Text Generation