Introduction

XGLM-1.7B is a multilingual autoregressive language model developed by Meta AI with 1.7 billion parameters. It is trained on a corpus encompassing 31 languages, featuring a total of 500 billion sub-tokens. The model is designed for few-shot learning tasks across diverse languages.

Architecture

XGLM-1.7B employs a transformer architecture tailored for autoregressive text generation. It supports 31 languages, including English, Russian, Chinese, and others from various language families. The model is built to handle multilingual tasks effectively, leveraging a balanced dataset to ensure broad linguistic coverage.

Training

The model is trained on a diverse set of languages, with a significant portion of the data allocated to commonly used languages like English, Russian, and Chinese. The training dataset is balanced to include a variety of languages, with specific token ratios for each, ensuring that even low-resource languages are represented through upsampling.

Guide: Running Locally

Basic Steps

  1. Install the Transformers library:

    pip install transformers torch
    
  2. Load the model and tokenizer:

    from transformers import XGLMTokenizer, XGLMForCausalLM
    
    tokenizer = XGLMTokenizer.from_pretrained("facebook/xglm-1.7B")
    model = XGLMForCausalLM.from_pretrained("facebook/xglm-1.7B")
    
  3. Run a sample task (e.g., evaluating the Choice of Plausible Alternatives (COPA)):

    def get_logprobs(prompt):
        inputs = tokenizer(prompt, return_tensors="pt")
        outputs = model(**inputs, labels=inputs["input_ids"])
        logits = outputs.logits
        return torch.gather(F.log_softmax(logits, dim=2), 2, inputs["input_ids"][:, 1:].unsqueeze(2))
    
    def COPA_eval(prompt, alternative1, alternative2):
        lprob1 = get_logprobs(prompt + "\n" + alternative1).sum()
        lprob2 = get_logprobs(prompt + "\n" + alternative2).sum()
        return 0 if lprob1 > lprob2 else 1
    

Cloud GPUs

For optimal performance, consider using cloud GPU services like AWS EC2, Google Cloud Platform, or Azure, which provide the necessary computational power to run large models efficiently.

License

XGLM-1.7B is released under the MIT License, allowing for broad usage and modification with appropriate attribution.

More Related APIs in Text Generation