timesformer hr finetuned k600
facebookIntroduction
The TimeSformer model, fine-tuned on Kinetics-600, is designed for video classification tasks. It leverages space-time attention mechanisms to classify videos into one of the 600 labels from the Kinetics-600 dataset. This model was introduced in the paper "TimeSformer: Is Space-Time Attention All You Need for Video Understanding?" by Tong et al.
Architecture
TimeSformer is a transformer-based model specifically tailored for video understanding. It applies space-time attention across video frames, effectively capturing temporal and spatial features for video classification.
Training
The TimeSformer model was pre-trained on the Kinetics-600 dataset, a large-scale video dataset consisting of 600 action classes. The fine-tuning process involved adjusting the model parameters to enhance its performance on this specific dataset.
Guide: Running Locally
To run TimeSformer locally, follow these steps:
-
Install the Transformers library: Ensure you have the Transformers library installed via pip:
pip install transformers
-
Import necessary libraries: Use PyTorch and NumPy:
from transformers import AutoImageProcessor, TimesformerForVideoClassification import numpy as np import torch
-
Prepare video input: Prepare your video data in the required format:
video = list(np.random.randn(16, 3, 448, 448))
-
Load the model and processor:
processor = AutoImageProcessor.from_pretrained("facebook/timesformer-hr-finetuned-k600") model = TimesformerForVideoClassification.from_pretrained("facebook/timesformer-hr-finetuned-k600")
-
Process the input and perform inference:
inputs = processor(images=video, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) logits = outputs.logits predicted_class_idx = logits.argmax(-1).item() print("Predicted class:", model.config.id2label[predicted_class_idx])
For optimal performance, consider using cloud GPUs from providers like AWS, GCP, or Azure.
License
The TimeSformer model is released under the CC BY-NC 4.0 license, allowing for non-commercial use with appropriate credit.