dialogpt_afriwoz_pidgin
tosinIntroduction
DialoGPT_AFRIWOZ_PIDGIN is a fine-tuned version of DialoGPT (small) specifically designed for conversational tasks in Nigerian Pidgin English. Trained on the AfriWOZ dataset, it focuses on domains like restaurants, hotels, taxis, and bookings. The model achieves a perplexity of 38.52 on its validation set.
Architecture
The model is based on the DialoGPT architecture, utilizing the transformers library in PyTorch. It is fine-tuned to handle conversational tasks within specific domains covered by the AfriWOZ dataset.
Training
DialoGPT_AFRIWOZ_PIDGIN was trained using the AfriWOZ dataset, which includes conversations relevant to specific domains such as dining and transportation services. The training aimed to optimize the model for generating coherent and contextually relevant responses in Nigerian Pidgin English.
Guide: Running Locally
To run the model locally, you need the transformers
library. Follow these basic steps:
-
Install Transformers: Ensure that the
transformers
andtorch
libraries are installed.pip install transformers torch
-
Load the Model and Tokenizer:
from transformers import AutoModelForCausalLM, AutoTokenizer import torch tokenizer = AutoTokenizer.from_pretrained("tosin/dialogpt_afriwoz_pidgin") model = AutoModelForCausalLM.from_pretrained("tosin/dialogpt_afriwoz_pidgin")
-
Chat with the Model: You can interact with the model using this snippet.
# Let's chat for 5 lines for step in range(5): new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='pt') bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids chat_history_ids = model.generate(bot_input_ids, max_length=1000, pad_token_id=tokenizer.eos_token_id) print("DialoGPT_pidgin_Bot: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))
-
Consider Cloud GPUs: For improved performance, especially with large-scale interactions, consider using cloud GPUs from platforms like AWS, Google Cloud, or Azure to expedite processing.
License
The model is licensed under the Creative Commons Attribution 4.0 International License (cc-by-4.0), allowing for sharing and adaptation with appropriate credit.