sam2.1 hiera large
facebookIntroduction
SAM 2: Segment Anything in Images and Videos is a foundational model developed by FAIR for promptable visual segmentation in images and videos. The model aims to advance the capabilities of visual segmentation through a prompt-based approach. More information can be found in the SAM 2 paper.
Architecture
SAM 2 is designed to handle segmentation tasks in both static images and dynamic video sequences. It leverages the sam2
library for its operations, utilizing a pipeline specifically tagged for mask generation. The model is available under the Apache 2.0 license.
Training
The documentation does not provide specific details about the training process of SAM 2. However, it likely involves standard practices for training large-scale segmentation models, utilizing extensive datasets of images and videos.
Guide: Running Locally
To run SAM 2 locally, follow these steps:
-
Setup Environment: Ensure you have Python and PyTorch installed. It's recommended to use a Python environment manager like
venv
orconda
. -
Install Dependencies: Clone the official repository and install any required dependencies.
git clone https://github.com/facebookresearch/segment-anything-2/ cd segment-anything-2 pip install -r requirements.txt
-
Image Prediction:
import torch from sam2.sam2_image_predictor import SAM2ImagePredictor predictor = SAM2ImagePredictor.from_pretrained("facebook/sam2-hiera-large") with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16): predictor.set_image(<your_image>) masks, _, _ = predictor.predict(<input_prompts>)
-
Video Prediction:
import torch from sam2.sam2_video_predictor import SAM2VideoPredictor predictor = SAM2VideoPredictor.from_pretrained("facebook/sam2-hiera-large") with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16): state = predictor.init_state(<your_video>) frame_idx, object_ids, masks = predictor.add_new_points_or_box(state, <your_prompts>) for frame_idx, object_ids, masks in predictor.propagate_in_video(state): ...
Cloud GPUs: For optimal performance, especially with video data, consider using cloud GPU services like AWS, Google Cloud, or Azure.
License
SAM 2 is released under the Apache License 2.0, allowing users to freely use, modify, and distribute the software with proper attribution.