nemo megatron gpt 1.3 B

nvidia

Introduction

Megatron-GPT 1.3B is a transformer-based language model developed by NVIDIA. It is designed for text-to-text generation tasks and utilizes 1.3 billion parameters. The model has been trained using NVIDIA's NeMo Megatron framework.

Architecture

Megatron-GPT 1.3B is based on a transformer decoder-only architecture, similar to GPT-2 and GPT-3 models. It features Tensor Parallelism (TP) and Pipeline Parallelism (PP), both set to 1, allowing it to run on a single NVIDIA GPU.

Training

The model was trained using "The Pile," a dataset curated by Eleuther.AI. It has been evaluated in a zero-shot setting using AI21's LM Evaluation Test Suite, achieving various scores across different benchmarks. However, the model may produce biased or toxic outputs due to the nature of the training data.

Guide: Running Locally

  1. Install NeMo and Dependencies:

    • Clone the NVIDIA Apex repository and install it:
      git clone https://github.com/ericharper/apex.git
      cd apex
      git checkout nm_v1.11.0
      pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--fast_layer_norm" --global-option="--distributed_adam" --global-option="--deprecated_fused_adam" ./
      
    • Install NeMo Toolkit:
      pip install nemo_toolkit['nlp']==1.11.0
      
    • Alternatively, use the NeMo Megatron training Docker container.
  2. Launch Eval Server:

    • Clone the NeMo repository and run the evaluation script:
      git clone https://github.com/NVIDIA/NeMo.git 
      cd NeMo/examples/nlp/language_modeling
      git checkout v1.11.0
      python megatron_gpt_eval.py gpt_model_file=nemo_gpt1.3B_fp16.nemo server=True tensor_model_parallel_size=1 trainer.devices=1
      
  3. Send Prompts to Your Model:

    • Use Python requests to send data to the model:
      import json
      import requests
      
      port_num = 5555
      headers = {"Content-Type": "application/json"}
      
      def request_data(data):
          resp = requests.put('http://localhost:{}/generate'.format(port_num),
                              data=json.dumps(data),
                              headers=headers)
          sentences = resp.json()['sentences']
          return sentences
      
      data = {
          "sentences": ["Tell me an interesting fact about space travel."],
          "tokens_to_generate": 50,
          "temperature": 1.0,
          "add_BOS": True,
          "top_k": 0,
          "top_p": 0.9,
          "greedy": False,
          "all_probs": False,
          "repetition_penalty": 1.2,
          "min_tokens_to_generate": 2,
      }
      
      sentences = request_data(data)
      print(sentences)
      

For optimal performance, consider using cloud GPUs from providers such as AWS or Google Cloud.

License

The model is released under the CC-BY-4.0 license, which allows for sharing and adaptation with appropriate credit. For detailed terms, refer to CC-BY-4.0 license.

More Related APIs in Text2text Generation