timesformer hr finetuned k600

facebook

Introduction

The TimeSformer model, fine-tuned on Kinetics-600, is designed for video classification tasks. It leverages space-time attention mechanisms to classify videos into one of the 600 labels from the Kinetics-600 dataset. This model was introduced in the paper "TimeSformer: Is Space-Time Attention All You Need for Video Understanding?" by Tong et al.

Architecture

TimeSformer is a transformer-based model specifically tailored for video understanding. It applies space-time attention across video frames, effectively capturing temporal and spatial features for video classification.

Training

The TimeSformer model was pre-trained on the Kinetics-600 dataset, a large-scale video dataset consisting of 600 action classes. The fine-tuning process involved adjusting the model parameters to enhance its performance on this specific dataset.

Guide: Running Locally

To run TimeSformer locally, follow these steps:

  1. Install the Transformers library: Ensure you have the Transformers library installed via pip:

    pip install transformers
    
  2. Import necessary libraries: Use PyTorch and NumPy:

    from transformers import AutoImageProcessor, TimesformerForVideoClassification
    import numpy as np
    import torch
    
  3. Prepare video input: Prepare your video data in the required format:

    video = list(np.random.randn(16, 3, 448, 448))
    
  4. Load the model and processor:

    processor = AutoImageProcessor.from_pretrained("facebook/timesformer-hr-finetuned-k600")
    model = TimesformerForVideoClassification.from_pretrained("facebook/timesformer-hr-finetuned-k600")
    
  5. Process the input and perform inference:

    inputs = processor(images=video, return_tensors="pt")
    
    with torch.no_grad():
      outputs = model(**inputs)
      logits = outputs.logits
    
    predicted_class_idx = logits.argmax(-1).item()
    print("Predicted class:", model.config.id2label[predicted_class_idx])
    

For optimal performance, consider using cloud GPUs from providers like AWS, GCP, or Azure.

License

The TimeSformer model is released under the CC BY-NC 4.0 license, allowing for non-commercial use with appropriate credit.

More Related APIs in Video Classification