T5-LARGE-WORD-SENSE-DISAMBIGUATION

Introduction

The T5-LARGE model for Word Sense Disambiguation (WSD) is a checkpoint trained on the SemCor 3.0 dataset. It is designed to enhance neural language models by incorporating word sense disambiguation capabilities. This model is particularly useful for distinguishing between different meanings of a word based on context.

Architecture

The model is based on the T5 (Text-to-Text Transfer Transformer) architecture, which is a transformer model adept at handling various text-to-text tasks. This pre-trained version has been specifically fine-tuned for word sense disambiguation tasks.

Training

The T5-LARGE-WORD-SENSE-DISAMBIGUATION model was trained on the SemCor 3.0 dataset. This dataset is a comprehensive collection used for WSD tasks. The training process involved fine-tuning the T5-large model, leveraging its text-to-text capabilities to improve its understanding of word senses in diverse contexts.

Guide: Running Locally

To use this model locally, follow these steps:

Install Transformers Library: Ensure you have the Hugging Face Transformers library installed.
```
pip install transformers
```

Load Model and Tokenizer:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model = AutoModelForSeq2SeqLM.from_pretrained("jpelhaw/t5-word-sense-disambiguation")
tokenizer = AutoTokenizer.from_pretrained("jpelhaw/t5-word-sense-disambiguation")

Prepare Input:

input = '''question: which description describes the word "java" best in the following context?
descriptions:[ "A drink consisting of an infusion of ground coffee beans", 
               "a platform-independent programming language", 
               "an island in Indonesia to the south of Borneo" ] 
context: I like to drink "java" in the morning.'''

Tokenize and Generate Answer:

example = tokenizer.tokenize(input, add_special_tokens=True)

answer = model.generate(input_ids=example['input_ids'], 
                        attention_mask=example['attention_mask'], 
                        max_length=135)

Output: The model will output the most appropriate description for the word "java" based on the given context.

For enhanced performance, consider using a cloud service with GPU support, such as AWS EC2, Google Cloud, or Azure, to run the model.

License

For details regarding the licensing of this model, please refer to its official repository or documentation on Hugging Face.