bioformer 8 L LLM Model — Open LLM List

Introduction

Bioformer-8L is a lightweight BERT model designed for biomedical text mining. It employs a biomedical-specific vocabulary and is pre-trained solely on biomedical domain corpora. It demonstrates efficiency, being three times faster than BERT-base, while offering comparable or superior performance to BioBERT/PubMedBERT on various NLP tasks. The model comprises 8 transformer layers, a hidden embedding size of 512, and 8 self-attention heads, totaling 42,820,610 parameters.

Architecture

Bioformer-8L uses a cased WordPiece vocabulary derived from a biomedical corpus, consisting of 33 million PubMed abstracts and 1 million PMC full-text articles. The vocabulary size is 32,768, similar to the original BERT. The pre-training process involves whole-word masking with a 15% masking rate and includes the Next Sentence Prediction (NSP) objective for potential downstream task requirements.

Training

Bioformer-8L was pre-trained using a single Cloud TPU device with a maximum sequence length of 512 and a batch size of 256, over 2 million steps, amounting to approximately 8.3 days of training. The training utilized SciSpacy for sentence segmentation.

Guide: Running Locally

Prerequisites:

Python 3
PyTorch
Transformers
Datasets

Installation Steps:

Install PyTorch by following the instructions.

Install the transformers and datasets libraries:

pip install transformers
pip install datasets

Use the model with the transformers library:

from transformers import pipeline
unmasker8L = pipeline('fill-mask', model='bioformers/bioformer-8L')
unmasker8L("[MASK] refers to a group of diseases that affect how the body uses blood sugar (glucose)")

Cloud GPUs:
Consider using cloud GPU services for efficient execution and resource management.

License

Bioformer-8L is licensed under the Apache-2.0 License.

More Related APIs in Fill Mask