nomic bert 2048 LLM Model

Introduction

NOMIC-BERT-2048 is a pretrained BERT model developed by Nomic AI, featuring a maximum sequence length of 2048 tokens. It incorporates several enhancements over traditional BERT models, including Rotary Position Embeddings and SwiGLU activations, providing benefits such as context length extrapolation and improved performance without dropout.

Architecture

The model employs Rotary Position Embeddings and SwiGLU activations, which are advancements inspired by MosaicBERT. These modifications aim to extend the model's context length and enhance its performance on various NLP tasks. The model's architecture allows it to handle longer sequences effectively, making it suitable for applications requiring extensive context.

Training

NOMIC-BERT-2048 was trained using data from BookCorpus and a 2023 Wikipedia dump. The training process involved tokenizing sequences to 2048 tokens. For sequences shorter than 2048 tokens, additional documents were appended. Longer documents were split to fit the sequence length. The model was evaluated using the GLUE benchmark, demonstrating comparable performance to other BERT models, with the added benefit of handling longer sequences.

Guide: Running Locally

To use NOMIC-BERT-2048 for masked language modeling:

Install the transformers library.

Load the model and tokenizer:

from transformers import AutoModelForMaskedLM, AutoConfig, AutoTokenizer, pipeline

tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
config = AutoConfig.from_pretrained('nomic-ai/nomic-bert-2048', trust_remote_code=True)
model = AutoModelForMaskedLM.from_pretrained('nomic-ai/nomic-bert-2048', config=config, trust_remote_code=True)

classifier = pipeline('fill-mask', model=model, tokenizer=tokenizer, device="cpu")
print(classifier("I [MASK] to the store yesterday."))

To finetune for a Sequence Classification task:

from transformers import AutoConfig, AutoModelForSequenceClassification

model_path = "nomic-ai/nomic-bert-2048"
config = AutoConfig.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForSequenceClassification.from_pretrained(model_path, config=config, trust_remote_code=True, strict=False)

Consider using cloud GPUs such as those offered by AWS, Google Cloud, or Azure for more efficient training and inference.

License

NOMIC-BERT-2048 is released under the Apache 2.0 license, which allows for both personal and commercial use, modification, and distribution, provided that the license terms are met.

More Related APIs in Fill Mask