xglm 564 M LLM Model — Open LLM List

Introduction

XGLM-564M is a multilingual autoregressive language model created by Meta AI, consisting of 564 million parameters. It is designed for text generation across 31 languages and was trained on a vast dataset comprising 500 billion sub-tokens. The model supports languages such as English, Russian, Chinese, German, Spanish, French, and many others. The model was introduced in the paper "Few-shot Learning with Multilingual Language Models" and implemented in the Fairseq library.

Architecture

XGLM-564M utilizes a transformer-based architecture typical of many modern language models. It is built to perform autoregressive text generation and supports multilingual processing, which allows it to handle various languages with efficiency. The model's architecture enables it to perform well in few-shot learning scenarios.

Training

The training dataset for XGLM-564M is diverse, covering 30 languages with a total of 500 billion sub-tokens. The dataset includes large proportions of tokens for languages like English, Russian, Chinese, and others. The training process involved balancing the language data to ensure robust multilingual capabilities. The model's training dataset is detailed in a table with the number of tokens and their respective ratios.

Guide: Running Locally

To run XGLM-564M locally, follow these basic steps:

Install Python and PyTorch: Ensure you have Python and PyTorch installed on your machine.
Install Transformers Library: Use pip to install the transformers library from Hugging Face.
```
pip install transformers
```

Load the Model and Tokenizer: Utilize the XGLMTokenizer and XGLMForCausalLM classes.

from transformers import XGLMTokenizer, XGLMForCausalLM

tokenizer = XGLMTokenizer.from_pretrained("facebook/xglm-564M")
model = XGLMForCausalLM.from_pretrained("facebook/xglm-564M")

Prepare Input Data: Create input data in the desired languages for text generation or evaluation.
Run Model Inference: Use the model to generate text or evaluate tasks, such as the zero-shot COPA task shown in the example.

For optimal performance, especially with large models, consider using cloud GPUs like those offered by AWS, Google Cloud, or Azure.

License

XGLM-564M is licensed under the MIT License, allowing for broad use and modification with minimal restrictions.

More Related APIs in Text Generation