Steel B E R T LLM Model — Open LLM List

Introduction

SteelBERT is a pre-trained model developed by MGE-LLMs, built on the DeBERTa architecture, specifically tailored for processing materials science literature. It leverages a corpus of 4.2 million materials abstracts and 55,000 full-text steel articles to enhance its understanding in the steel domain.

Architecture

SteelBERT consists of 188 million parameters structured across 12 Transformer encoders, each with 12 attention heads. The model utilizes a maximum sentence length of 512 tokens and employs a specialized tokenizer with a vocabulary of 128,100 words adapted for the steel domain.

Training

The model underwent self-supervised training using Masked Language Modeling, masking 15% of the tokens. Training was executed on 8 NVIDIA A100 40GB GPUs over 840 hours, with 95% of the data used for training and 5% for validation, achieving a validation loss of 1.158.

Guide: Running Locally

To run SteelBERT locally, follow these steps:

Environment Setup:
- Ensure Python and PyTorch are installed in your environment.
- Install the transformers library from Hugging Face.

Load the Model:

from transformers import AutoTokenizer, AutoModel
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model_path = "MGE-LLMs/SteelBERT"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModel.from_pretrained(model_path).to(device)

Prepare Data and Perform Inference:
- Tokenize your input texts and pass them through the model to get embeddings.
Hardware Recommendation:
- Utilize cloud GPU services such as AWS EC2, Google Cloud Platform, or Azure with NVIDIA A100 GPUs to achieve optimal performance.

License

SteelBERT is released under the Apache 2.0 License, allowing for both commercial and non-commercial use.

More Related APIs in Fill Mask