nb gpt j 6 B
NbAiLabIntroduction
NB-GPT-J-6B is a Norwegian finetuned version of GPT-J 6B, a transformer model with 6 billion parameters. It is designed for text generation tasks and has been fine-tuned on Norwegian language datasets.
Architecture
The model consists of 28 layers with a model dimension of 4096 and a feedforward dimension of 16384. It employs 16 attention heads, each with a dimension of 256, and uses Rotary Position Embedding (RoPE) for positional encoding. The tokenization vocabulary includes 50257 entries, using the same tokenizer as GPT-2 and GPT-3.
Training
NB-GPT-J-6B was fine-tuned on the Norwegian Colossal Corpus (NCC) and additional internet sources like Wikipedia, mC4, and OSCAR. It was trained for 130 billion tokens over 1,000,000 steps on a TPU v3-8 VM, using cross-entropy loss to optimize next-token predictions.
Guide: Running Locally
To use the model, load it with the following code:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("NbAiLab/nb-gpt-j-6B")
model = AutoModelForCausalLM.from_pretrained("NbAiLab/nb-gpt-j-6B")
For enhanced performance, consider using cloud GPUs from providers like Google Cloud or AWS.
License
NB-GPT-J-6B is released under the Apache 2.0 license. This allows wide usage with minimal restrictions, provided that proper attribution is given and any modifications are documented.