subject_classifier_extended LLM Model

Introduction

The Subject Classifier Extended is a text classification model designed to categorize text into six academic subjects: Biology, Physics, Chemistry, Maths, Social Science, and English. This model utilizes the RoBERTa architecture and is implemented in PyTorch, offering compatibility with Hugging Face's Inference Endpoints.

Architecture

The model is based on the RoBERTa transformer architecture, which is known for its robust performance in natural language processing tasks. RoBERTa is an extension of BERT (Bidirectional Encoder Representations from Transformers) with improved training methodologies, making it suitable for various classification tasks.

Training

The model was trained on a dataset with the following distribution across subjects:

Physics: 7000 samples
Maths: 7000 samples
Biology: 7000 samples
Chemistry: 7000 samples
English: 5254 samples
Social Science: 7000 samples

This distribution suggests a balanced dataset across most categories, with a slightly lower number of samples for English.

Guide: Running Locally

To run the model locally, follow these steps:

Clone the Repository: Clone the model repository from Hugging Face's model hub.
Set Up Environment: Ensure you have Python and PyTorch installed. Use virtual environments for better dependency management.
Install Hugging Face Transformers: Install the Transformers library using pip:
```
pip install transformers
```
Load the Model: Use the Transformers library to load the model and tokenizer.
Inference: Run inference on your text data to classify it into one of the six subjects.

Suggested Cloud GPUs

For more efficient computation, consider using cloud GPUs such as AWS EC2 with NVIDIA GPUs, Google Cloud's GPU offerings, or Azure's GPU instances. These platforms provide scalable solutions for faster model training and inference.

License

The licensing terms for this model were not specified in the provided content. Please refer to the Hugging Face model hub or contact the model creator for detailed licensing information.

More Related APIs in Text Classification