creative writer v0.1 alfa 35b
jukofyorkIntroduction
The Creative-Writer-V0.1-Alfa-35B is an experimental text generation model fine-tuned using the "multiplicative-LoRA" method. It is designed to encourage diverse and creative text generation. This model has been tested against variations like Bravo and Charlie, which employ different scaling techniques.
Architecture
The model uses a "multiplicative-LoRA" approach, which adjusts the down projection matrices in neural networks. It differs from the traditional "additive-LoRA" by performing transformations on the outputs rather than inputs. This method is linked to control vectors and allows for complex transformations like orthogonal and non-orthogonal projections.
Training
The model was trained over four days using dual A6000 GPUs. The dataset consisted of approximately 1000 pre-2012 books converted to Markdown, totaling around 180 million tokens. Training employed a sequence length of 8192 and a batch size of 8192 tokens. The learning rate was set to 5e-6, with early stopping used as a form of regularization. The training configuration ensured that the context length remained at 8k tokens, allowing for extended user-AI interactions.
Guide: Running Locally
- Setup Environment: Ensure you have Python and the necessary libraries like
transformers
installed. - Download Model: Obtain the model files from Hugging Face.
- Run Model: Use a script or Jupyter Notebook to load and run the model for text generation.
- Configuration: Adjust parameters like temperature and min-p for desired output quality.
- Hardware: Utilize cloud GPUs like AWS EC2 with NVIDIA GPUs for efficient performance.
License
This model is licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0). This license allows for sharing and adaptation for non-commercial purposes, with appropriate credit given.