sam2.1 hiera large

facebook

Introduction

SAM 2: Segment Anything in Images and Videos is a foundational model developed by FAIR for promptable visual segmentation in images and videos. The model aims to advance the capabilities of visual segmentation through a prompt-based approach. More information can be found in the SAM 2 paper.

Architecture

SAM 2 is designed to handle segmentation tasks in both static images and dynamic video sequences. It leverages the sam2 library for its operations, utilizing a pipeline specifically tagged for mask generation. The model is available under the Apache 2.0 license.

Training

The documentation does not provide specific details about the training process of SAM 2. However, it likely involves standard practices for training large-scale segmentation models, utilizing extensive datasets of images and videos.

Guide: Running Locally

To run SAM 2 locally, follow these steps:

  1. Setup Environment: Ensure you have Python and PyTorch installed. It's recommended to use a Python environment manager like venv or conda.

  2. Install Dependencies: Clone the official repository and install any required dependencies.

    git clone https://github.com/facebookresearch/segment-anything-2/
    cd segment-anything-2
    pip install -r requirements.txt
    
  3. Image Prediction:

    import torch
    from sam2.sam2_image_predictor import SAM2ImagePredictor
    
    predictor = SAM2ImagePredictor.from_pretrained("facebook/sam2-hiera-large")
    
    with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):
        predictor.set_image(<your_image>)
        masks, _, _ = predictor.predict(<input_prompts>)
    
  4. Video Prediction:

    import torch
    from sam2.sam2_video_predictor import SAM2VideoPredictor
    
    predictor = SAM2VideoPredictor.from_pretrained("facebook/sam2-hiera-large")
    
    with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):
        state = predictor.init_state(<your_video>)
        frame_idx, object_ids, masks = predictor.add_new_points_or_box(state, <your_prompts>)
        for frame_idx, object_ids, masks in predictor.propagate_in_video(state):
            ...
    

Cloud GPUs: For optimal performance, especially with video data, consider using cloud GPU services like AWS, Google Cloud, or Azure.

License

SAM 2 is released under the Apache License 2.0, allowing users to freely use, modify, and distribute the software with proper attribution.

More Related APIs in Mask Generation