dialogpt_afriwoz_pidgin LLM Model

Introduction

DialoGPT_AFRIWOZ_PIDGIN is a fine-tuned version of DialoGPT (small) specifically designed for conversational tasks in Nigerian Pidgin English. Trained on the AfriWOZ dataset, it focuses on domains like restaurants, hotels, taxis, and bookings. The model achieves a perplexity of 38.52 on its validation set.

Architecture

The model is based on the DialoGPT architecture, utilizing the transformers library in PyTorch. It is fine-tuned to handle conversational tasks within specific domains covered by the AfriWOZ dataset.

Training

DialoGPT_AFRIWOZ_PIDGIN was trained using the AfriWOZ dataset, which includes conversations relevant to specific domains such as dining and transportation services. The training aimed to optimize the model for generating coherent and contextually relevant responses in Nigerian Pidgin English.

Guide: Running Locally

To run the model locally, you need the transformers library. Follow these basic steps:

Install Transformers: Ensure that the transformers and torch libraries are installed.
```
pip install transformers torch
```

Load the Model and Tokenizer:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("tosin/dialogpt_afriwoz_pidgin")
model = AutoModelForCausalLM.from_pretrained("tosin/dialogpt_afriwoz_pidgin")

Chat with the Model: You can interact with the model using this snippet.

# Let's chat for 5 lines
for step in range(5):
    new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='pt')
    bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids
    chat_history_ids = model.generate(bot_input_ids, max_length=1000, pad_token_id=tokenizer.eos_token_id)
    print("DialoGPT_pidgin_Bot: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))

Consider Cloud GPUs: For improved performance, especially with large-scale interactions, consider using cloud GPUs from platforms like AWS, Google Cloud, or Azure to expedite processing.

License

The model is licensed under the Creative Commons Attribution 4.0 International License (cc-by-4.0), allowing for sharing and adaptation with appropriate credit.

More Related APIs in Text Generation