xglm 1.7 B
facebookIntroduction
XGLM-1.7B is a multilingual autoregressive language model developed by Meta AI with 1.7 billion parameters. It is trained on a corpus encompassing 31 languages, featuring a total of 500 billion sub-tokens. The model is designed for few-shot learning tasks across diverse languages.
Architecture
XGLM-1.7B employs a transformer architecture tailored for autoregressive text generation. It supports 31 languages, including English, Russian, Chinese, and others from various language families. The model is built to handle multilingual tasks effectively, leveraging a balanced dataset to ensure broad linguistic coverage.
Training
The model is trained on a diverse set of languages, with a significant portion of the data allocated to commonly used languages like English, Russian, and Chinese. The training dataset is balanced to include a variety of languages, with specific token ratios for each, ensuring that even low-resource languages are represented through upsampling.
Guide: Running Locally
Basic Steps
-
Install the Transformers library:
pip install transformers torch
-
Load the model and tokenizer:
from transformers import XGLMTokenizer, XGLMForCausalLM tokenizer = XGLMTokenizer.from_pretrained("facebook/xglm-1.7B") model = XGLMForCausalLM.from_pretrained("facebook/xglm-1.7B")
-
Run a sample task (e.g., evaluating the Choice of Plausible Alternatives (COPA)):
def get_logprobs(prompt): inputs = tokenizer(prompt, return_tensors="pt") outputs = model(**inputs, labels=inputs["input_ids"]) logits = outputs.logits return torch.gather(F.log_softmax(logits, dim=2), 2, inputs["input_ids"][:, 1:].unsqueeze(2)) def COPA_eval(prompt, alternative1, alternative2): lprob1 = get_logprobs(prompt + "\n" + alternative1).sum() lprob2 = get_logprobs(prompt + "\n" + alternative2).sum() return 0 if lprob1 > lprob2 else 1
Cloud GPUs
For optimal performance, consider using cloud GPU services like AWS EC2, Google Cloud Platform, or Azure, which provide the necessary computational power to run large models efficiently.
License
XGLM-1.7B is released under the MIT License, allowing for broad usage and modification with appropriate attribution.