Long Writer llm jp 3 3.7b instruct
KendamarronIntroduction
The LongWriter-llm-jp-3-3.7b-instruct model is designed for generating long-form text outputs in Japanese. It is based on the LLAMA architecture and has been fine-tuned with a focus on long text generation capabilities.
Architecture
The model is built using the Transformers library and is a variant of the llm-jp/llm-jp-3-3.7b-instruct model. It utilizes llama-factory components and supports full fine-tuning. The model is compatible with Safetensors and is designed for conversational and text-generation inference.
Training
The model was fine-tuned using the SFT (Supervised Fine-Tuning) method with a dataset specifically crafted for Japanese long text generation. Training employed a multi-GPU setup with the following hyperparameters:
- Learning rate: 1e-05
- Training batch size: 2
- Evaluation batch size: 1
- Total training batch size: 8
- Total evaluation batch size: 4
- Optimizer: ADAMW_BNB
- Learning rate scheduler: Cosine with a warmup ratio of 0.1
- Number of epochs: 2.0
Training results included a training loss of 0.7184 and a validation loss of 0.7673.
Guide: Running Locally
To run the model locally, follow these steps:
- Install the required libraries: Transformers (v4.46.1), PyTorch (v2.5.1+cu124), Datasets (v3.1.0), and Tokenizers (v0.20.3).
- Download the model from the Hugging Face repository.
- Set up your environment with a compatible GPU. It is recommended to use cloud GPU services such as AWS EC2 with NVIDIA GPUs for efficient processing.
- Load the model using the Transformers library and prepare your input data.
- Run inference tasks and adjust parameters as needed for your specific use case.
License
The LongWriter-llm-jp-3-3.7b-instruct model is licensed under the Apache-2.0 license, allowing for both personal and commercial use with appropriate attribution.