Dialo G P T large
microsoftIntroduction
DialoGPT is a state-of-the-art (SOTA) large-scale pretrained dialogue response generation model designed for multiturn conversations. It is capable of generating responses that are comparable to human responses, as demonstrated by its performance in a single-turn conversation Turing test. The model was trained on 147 million multi-turn dialogues from Reddit discussion threads.
Architecture
DialoGPT is based on the Transformer architecture, specifically designed for text generation tasks. It utilizes a large-scale dataset to pretrain the model, thereby enhancing its ability to understand and generate coherent and contextually relevant dialogue responses.
Training
The model was trained by leveraging a vast dataset consisting of 147 million multi-turn dialogues sourced from Reddit. This extensive dataset contributes to the model's ability to simulate human-like conversations effectively. For detailed information on preprocessing and training, refer to the original DialoGPT repository.
Guide: Running Locally
To run DialoGPT locally:
-
Install Transformers and PyTorch: Ensure you have the
transformers
library andtorch
installed. -
Load the Model and Tokenizer:
from transformers import AutoModelForCausalLM, AutoTokenizer import torch tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-large") model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-large")
-
Interact with the Model:
- Use the following code snippet to engage in a conversation with the model:
# Let's chat for 5 lines for step in range(5): new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='pt') bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids chat_history_ids = model.generate(bot_input_ids, max_length=1000, pad_token_id=tokenizer.eos_token_id) print("DialoGPT: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))
- Use the following code snippet to engage in a conversation with the model:
-
Utilize Cloud GPUs: For efficient processing and to handle larger-scale interactions, consider using cloud GPU services such as AWS, Google Cloud, or Microsoft Azure.
License
DialoGPT is released under the MIT License, allowing for broad use and modification of the model within the guidelines specified by the license.