rdt 1b LLM Model — Open LLM List

Introduction

RDT-1B is a 1 billion parameter imitation learning Diffusion Transformer model pre-trained on over 1 million multi-robot episodes. It is designed to predict robot actions from language instructions and RGB images. The model supports various modern mobile manipulators, including single-arm, dual-arm, and wheeled robots.

Architecture

Developed by: TSAIL group, Tsinghua University
Task Type: Vision-Language-Action
Model Type: Diffusion Policy with Transformers
Multi-Modal Encoders:
- Vision Backbone: siglip-so400m-patch14-384
- Language Model: t5-v1_1-xxl
Pre-Training Datasets: Utilizes 46 datasets including RT-1 Dataset, RH20T, DROID, and others.

Training

RDT-1B is trained to take language instructions, RGB images, control frequency, and proprioception as input to predict the next 64 robot actions. The model uses a unified action space to accommodate different robot platforms, although it may require fine-tuning for new, unseen platforms.

Guide: Running Locally

Clone the Repository:

git clone https://github.com/thu-ml/RoboticsDiffusionTransformer
cd RoboticsDiffusionTransformer

Install Dependencies: Follow the instructions in the repository to set up the environment.

Create and Configure the Model:

from scripts.agilex_model import create_model
config = {
    'episode_len': 1000,
    'state_dim': 14,
    'chunk_size': 64,
    'camera_names': ['cam_high', 'cam_right_wrist', 'cam_left_wrist'],
}
model = create_model(
    args=config,
    dtype=torch.bfloat16,
    pretrained_vision_encoder_name_or_path="google/siglip-so400m-patch14-384",
    pretrained='robotics-diffusion-transformer/rdt-1b',
    control_frequency=25,
)

Perform Inference: Load pre-computed language embeddings and use the model to predict actions.

Cloud GPUs: For optimal performance, consider using cloud services like AWS EC2, Google Cloud, or Azure for access to high-performance GPUs.

License

The RDT-1B model, code, pre-trained weights, and data are available under the MIT license.

More Related APIs in Robotics

grasp_diffusion

vc1 base