Ambatron B E R Ta
Peerawat2024Introduction
AmbatronBERTa is a Thai language model fine-tuned for text classification tasks. It builds upon the WangchanBERTa architecture to enhance classification accuracy in Thai texts, utilizing a dataset of over 3,000 research papers.
Architecture
AmbatronBERTa is based on the transformer-based WangchanBERTa model. It effectively captures the nuances of Thai language, making it suitable for various document classification tasks.
Training
The model was fine-tuned using a dataset consisting of over 3,000 research papers. This training process was aimed at improving the model's ability to handle Thai text classification across different domains.
Guide: Running Locally
To use AmbatronBERTa with the transformers
library, follow these steps:
-
Install the
transformers
library if not already installed:pip install transformers
-
Use the following code to load the tokenizer and model:
from transformers import AutoTokenizer, AutoModelForSequenceClassification # Load the tokenizer and model tokenizer = AutoTokenizer.from_pretrained("Peerawat2024/AmbatronBERTa") model = AutoModelForSequenceClassification.from_pretrained("Peerawat2024/AmbatronBERTa")
-
Optionally, consider using cloud GPUs from providers like AWS, GCP, or Azure to enhance performance during model training and inference.
License
The license for AmbatronBERTa is currently unknown.