convnext large 224
facebookConvNeXT-Large-224 Model
Introduction
The ConvNeXT-Large-224 model is a convolutional neural network designed for image classification, trained on the ImageNet-1k dataset. It operates at a resolution of 224x224 pixels. This architecture is detailed in the paper "A ConvNet for the 2020s" by Liu et al. The model is available on Hugging Face's platform, where it can be used for various image classification tasks.
Architecture
ConvNeXT is a pure convolutional model inspired by Vision Transformers, aiming to outperform them. The architecture starts with a ResNet design and incorporates modern features inspired by the Swin Transformer. The design focuses on enhancing the performance of traditional ConvNets by integrating innovative elements from recent advances in neural network architectures.
Training
The ConvNeXT model was trained on the ImageNet-1k dataset, which contains 1,000 classes of images. This extensive training allows the model to generalize well across a variety of image recognition tasks. The training process involved modernizing the classical ResNet architecture with updates that align with current best practices in neural network design.
Guide: Running Locally
To run the ConvNeXT-Large-224 model locally, follow these steps:
-
Install Dependencies: Ensure that you have the
transformers
,torch
, anddatasets
libraries installed. You can do this using pip:pip install transformers torch datasets
-
Load Dataset and Model:
from transformers import ConvNextImageProcessor, ConvNextForImageClassification import torch from datasets import load_dataset dataset = load_dataset("huggingface/cats-image") image = dataset["test"]["image"][0] processor = ConvNextImageProcessor.from_pretrained("facebook/convnext-large-224") model = ConvNextForImageClassification.from_pretrained("facebook/convnext-large-224") inputs = processor(image, return_tensors="pt") with torch.no_grad(): logits = model(**inputs).logits # Predict ImageNet class predicted_label = logits.argmax(-1).item() print(model.config.id2label[predicted_label])
-
Hardware Recommendation: For optimal performance, especially with large models, it is recommended to use cloud GPUs such as NVIDIA Tesla V100 or A100 available on platforms like AWS, Azure, or Google Cloud.
License
The ConvNeXT-Large-224 model is released under the Apache-2.0 license, allowing for free use and distribution with proper attribution.