galactica 120b LLM Model — Open LLM List

Introduction

GALACTICA is a large language model designed for scientific applications, developed by Meta AI's Papers with Code team. It is optimized for tasks such as citation prediction, scientific QA, mathematical reasoning, summarization, document generation, molecular property prediction, and entity extraction. The model is available in various sizes, with the largest being 120 billion parameters. It aims to assist researchers and developers in organizing and utilizing scientific literature.

Architecture

GALACTICA is based on a Transformer architecture in a decoder-only setup with some modifications. This architecture allows the model to efficiently handle tasks related to language processing and generation. The model's architecture is designed to work specifically with scientific text, providing a natural language interface for various scientific tasks.

Training

The GALACTICA models are trained on 106 billion tokens of open-access scientific text and data. This includes a diverse range of sources such as papers, textbooks, scientific websites, encyclopedias, and knowledge bases. The training process focuses on enabling the models to perform well on knowledge-intensive tasks and general NLP tasks. Despite its extensive training, GALACTICA can still exhibit hallucinations and biases, particularly with less popular scientific concepts.

Guide: Running Locally

To run GALACTICA locally, you need to install the necessary libraries and set up your environment:

Install the Transformers library and dependencies:
- Use pip install transformers to install the library.
- For GPU support, install accelerate using pip install accelerate.
- For INT8 precision, also install bitsandbytes with pip install bitsandbytes.

Load the Model:

Import the necessary classes from Transformers:

from transformers import AutoTokenizer, OPTForCausalLM

Load the tokenizer and the model:

tokenizer = AutoTokenizer.from_pretrained("facebook/galactica-120b")
model = OPTForCausalLM.from_pretrained("facebook/galactica-120b", device_map="auto")

Run Inference:

Prepare your input text and tokenize it:

input_text = "The Transformer architecture [START_REF]"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

Generate outputs from the model and decode them:

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

Cloud GPU Options:
- Consider using cloud GPU services like AWS, Google Cloud, or Azure to handle the computational requirements efficiently.

License

GALACTICA is released under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. This license allows for modification and distribution of the model for non-commercial purposes, with proper attribution.

More Related APIs in Text Generation