galactica 120b
facebookIntroduction
GALACTICA is a large language model designed for scientific applications, developed by Meta AI's Papers with Code team. It is optimized for tasks such as citation prediction, scientific QA, mathematical reasoning, summarization, document generation, molecular property prediction, and entity extraction. The model is available in various sizes, with the largest being 120 billion parameters. It aims to assist researchers and developers in organizing and utilizing scientific literature.
Architecture
GALACTICA is based on a Transformer architecture in a decoder-only setup with some modifications. This architecture allows the model to efficiently handle tasks related to language processing and generation. The model's architecture is designed to work specifically with scientific text, providing a natural language interface for various scientific tasks.
Training
The GALACTICA models are trained on 106 billion tokens of open-access scientific text and data. This includes a diverse range of sources such as papers, textbooks, scientific websites, encyclopedias, and knowledge bases. The training process focuses on enabling the models to perform well on knowledge-intensive tasks and general NLP tasks. Despite its extensive training, GALACTICA can still exhibit hallucinations and biases, particularly with less popular scientific concepts.
Guide: Running Locally
To run GALACTICA locally, you need to install the necessary libraries and set up your environment:
-
Install the Transformers library and dependencies:
- Use
pip install transformers
to install the library. - For GPU support, install
accelerate
usingpip install accelerate
. - For INT8 precision, also install
bitsandbytes
withpip install bitsandbytes
.
- Use
-
Load the Model:
- Import the necessary classes from Transformers:
from transformers import AutoTokenizer, OPTForCausalLM
- Load the tokenizer and the model:
tokenizer = AutoTokenizer.from_pretrained("facebook/galactica-120b") model = OPTForCausalLM.from_pretrained("facebook/galactica-120b", device_map="auto")
- Import the necessary classes from Transformers:
-
Run Inference:
- Prepare your input text and tokenize it:
input_text = "The Transformer architecture [START_REF]" input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
- Generate outputs from the model and decode them:
outputs = model.generate(input_ids) print(tokenizer.decode(outputs[0]))
- Prepare your input text and tokenize it:
-
Cloud GPU Options:
- Consider using cloud GPU services like AWS, Google Cloud, or Azure to handle the computational requirements efficiently.
License
GALACTICA is released under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. This license allows for modification and distribution of the model for non-commercial purposes, with proper attribution.