tactile_transformer
suddhuIntroduction
The Tactile Transformer is a model designed to predict contact depth from vision-based touch sensors. These sensors capture the geometry of contact as images, which are distinct from natural images. The model is trained to generalize across different real-world DIGIT sensors and is based on advancements in dense depth prediction using Vision Transformers (ViT). The Tactile Transformer is entirely trained in simulation, making it applicable to various sensors.
Architecture
The Tactile Transformer leverages a Vision Transformer (ViT) architecture, adapted from the FocusOnDepth implementation, which itself re-implements the DPT vision transformer. The modifications made to this architecture allow it to process tactile images for depth prediction. The model is trained using TACTO data from simulated interactions with YCB objects. There are two versions of the model: dpt_real.p
and dpt_sim.p
, which differ based on the augmentations used during data generation.
Training
Training is conducted entirely in simulation using data generated from interactions with YCB objects. The training process involves the use of augmented tactile images to enhance the model's ability to generalize across different tactile scenarios. The simulation environment allows for the collection of diverse interaction data necessary for robust model training.
Guide: Running Locally
-
Download Data:
- Navigate to the
data
directory. - Use
gdown
to download the tactile data from YCB objects:cd data gdown https://drive.google.com/drive/folders/1a-8vfMCkW52BpWOPfqk5WM5zsSjBfhN1?usp=sharing --folder mv sim tacto_data cd tacto_data && unzip -q '*.zip' && rm *.zip cd ../..
- Navigate to the
-
Run the Test Script:
- Execute the test script to generate depth outputs from the tactile data:
python neuralfeels/contrib/tactile_transformer/touch_vit.py
- Execute the test script to generate depth outputs from the tactile data:
-
Cloud GPU Suggestion:
- For efficient processing, consider using cloud GPU services such as AWS EC2, Google Cloud, or Azure's GPU offerings.
License
The Tactile Transformer is released under the MIT License, which allows for flexibility in usage, modification, and distribution.