Long Writer llm jp 3 3.7b instruct

Kendamarron

Introduction

The LongWriter-llm-jp-3-3.7b-instruct model is designed for generating long-form text outputs in Japanese. It is based on the LLAMA architecture and has been fine-tuned with a focus on long text generation capabilities.

Architecture

The model is built using the Transformers library and is a variant of the llm-jp/llm-jp-3-3.7b-instruct model. It utilizes llama-factory components and supports full fine-tuning. The model is compatible with Safetensors and is designed for conversational and text-generation inference.

Training

The model was fine-tuned using the SFT (Supervised Fine-Tuning) method with a dataset specifically crafted for Japanese long text generation. Training employed a multi-GPU setup with the following hyperparameters:

  • Learning rate: 1e-05
  • Training batch size: 2
  • Evaluation batch size: 1
  • Total training batch size: 8
  • Total evaluation batch size: 4
  • Optimizer: ADAMW_BNB
  • Learning rate scheduler: Cosine with a warmup ratio of 0.1
  • Number of epochs: 2.0

Training results included a training loss of 0.7184 and a validation loss of 0.7673.

Guide: Running Locally

To run the model locally, follow these steps:

  1. Install the required libraries: Transformers (v4.46.1), PyTorch (v2.5.1+cu124), Datasets (v3.1.0), and Tokenizers (v0.20.3).
  2. Download the model from the Hugging Face repository.
  3. Set up your environment with a compatible GPU. It is recommended to use cloud GPU services such as AWS EC2 with NVIDIA GPUs for efficient processing.
  4. Load the model using the Transformers library and prepare your input data.
  5. Run inference tasks and adjust parameters as needed for your specific use case.

License

The LongWriter-llm-jp-3-3.7b-instruct model is licensed under the Apache-2.0 license, allowing for both personal and commercial use with appropriate attribution.

More Related APIs in Text Generation