U T E N A 7 B N S F W V2
AI-BIntroduction
UTENA-7B-NSFW-V2 is a text-generation model developed through the merge of pre-trained language models using the MergeKit tool. It is designed to handle various text generation tasks, evaluated on multiple datasets, and is part of the Open LLM Leaderboard.
Architecture
The model is a blend of two parent models, AI-B/UTENA-7B-NSFW and AI-B/UTENA-7B-BAGEL, using the SLERP merge method. The configuration involves slicing layers from each model and applying specific parameters and filters. The model operates in bfloat16
precision.
Training
The model was evaluated across several datasets with different few-shot settings:
- AI2 Reasoning Challenge (25-Shot): Achieved a normalized accuracy of 63.31.
- HellaSwag (10-Shot): Achieved a normalized accuracy of 84.54.
- MMLU (5-Shot): Achieved an accuracy of 63.97.
- TruthfulQA (0-shot): Recorded a metric value of 47.81.
- Winogrande (5-shot): Achieved an accuracy of 78.69.
- GSM8k (5-shot): Achieved an accuracy of 42.38.
Guide: Running Locally
-
Clone the Repository:
git clone https://huggingface.co/AI-B/UTENA-7B-NSFW-V2 cd UTENA-7B-NSFW-V2
-
Install Dependencies:
Ensure you have Python and necessary libraries liketransformers
installed:pip install transformers
-
Load the Model:
Use the following Python code to load and interact with the model:from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("AI-B/UTENA-7B-NSFW-V2") model = AutoModelForCausalLM.from_pretrained("AI-B/UTENA-7B-NSFW-V2") input_text = "Your input text here" inputs = tokenizer(input_text, return_tensors="pt") outputs = model.generate(**inputs) print(tokenizer.decode(outputs[0]))
-
Consider Cloud GPUs:
For efficient computation, consider using cloud GPU services like AWS EC2, Google Cloud Platform, or Azure.
License
The model is released under the Unlicense, allowing for free use, modification, and distribution of the model and associated components.