nb gpt j 6 B

NbAiLab

Introduction

NB-GPT-J-6B is a Norwegian finetuned version of GPT-J 6B, a transformer model with 6 billion parameters. It is designed for text generation tasks and has been fine-tuned on Norwegian language datasets.

Architecture

The model consists of 28 layers with a model dimension of 4096 and a feedforward dimension of 16384. It employs 16 attention heads, each with a dimension of 256, and uses Rotary Position Embedding (RoPE) for positional encoding. The tokenization vocabulary includes 50257 entries, using the same tokenizer as GPT-2 and GPT-3.

Training

NB-GPT-J-6B was fine-tuned on the Norwegian Colossal Corpus (NCC) and additional internet sources like Wikipedia, mC4, and OSCAR. It was trained for 130 billion tokens over 1,000,000 steps on a TPU v3-8 VM, using cross-entropy loss to optimize next-token predictions.

Guide: Running Locally

To use the model, load it with the following code:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("NbAiLab/nb-gpt-j-6B")
model = AutoModelForCausalLM.from_pretrained("NbAiLab/nb-gpt-j-6B")

For enhanced performance, consider using cloud GPUs from providers like Google Cloud or AWS.

License

NB-GPT-J-6B is released under the Apache 2.0 license. This allows wide usage with minimal restrictions, provided that proper attribution is given and any modifications are documented.

More Related APIs in Text Generation