nemo megatron gpt 1.3 B
nvidiaIntroduction
Megatron-GPT 1.3B is a transformer-based language model developed by NVIDIA. It is designed for text-to-text generation tasks and utilizes 1.3 billion parameters. The model has been trained using NVIDIA's NeMo Megatron framework.
Architecture
Megatron-GPT 1.3B is based on a transformer decoder-only architecture, similar to GPT-2 and GPT-3 models. It features Tensor Parallelism (TP) and Pipeline Parallelism (PP), both set to 1, allowing it to run on a single NVIDIA GPU.
Training
The model was trained using "The Pile," a dataset curated by Eleuther.AI. It has been evaluated in a zero-shot setting using AI21's LM Evaluation Test Suite, achieving various scores across different benchmarks. However, the model may produce biased or toxic outputs due to the nature of the training data.
Guide: Running Locally
-
Install NeMo and Dependencies:
- Clone the NVIDIA Apex repository and install it:
git clone https://github.com/ericharper/apex.git cd apex git checkout nm_v1.11.0 pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--fast_layer_norm" --global-option="--distributed_adam" --global-option="--deprecated_fused_adam" ./
- Install NeMo Toolkit:
pip install nemo_toolkit['nlp']==1.11.0
- Alternatively, use the NeMo Megatron training Docker container.
- Clone the NVIDIA Apex repository and install it:
-
Launch Eval Server:
- Clone the NeMo repository and run the evaluation script:
git clone https://github.com/NVIDIA/NeMo.git cd NeMo/examples/nlp/language_modeling git checkout v1.11.0 python megatron_gpt_eval.py gpt_model_file=nemo_gpt1.3B_fp16.nemo server=True tensor_model_parallel_size=1 trainer.devices=1
- Clone the NeMo repository and run the evaluation script:
-
Send Prompts to Your Model:
- Use Python requests to send data to the model:
import json import requests port_num = 5555 headers = {"Content-Type": "application/json"} def request_data(data): resp = requests.put('http://localhost:{}/generate'.format(port_num), data=json.dumps(data), headers=headers) sentences = resp.json()['sentences'] return sentences data = { "sentences": ["Tell me an interesting fact about space travel."], "tokens_to_generate": 50, "temperature": 1.0, "add_BOS": True, "top_k": 0, "top_p": 0.9, "greedy": False, "all_probs": False, "repetition_penalty": 1.2, "min_tokens_to_generate": 2, } sentences = request_data(data) print(sentences)
- Use Python requests to send data to the model:
For optimal performance, consider using cloud GPUs from providers such as AWS or Google Cloud.
License
The model is released under the CC-BY-4.0 license, which allows for sharing and adaptation with appropriate credit. For detailed terms, refer to CC-BY-4.0 license.