ppo Lunar Lander v2
roberta-sgarigliaIntroduction
This document provides information on a trained Proximal Policy Optimization (PPO) agent designed to play LunarLander-v2 using the stable-baselines3
library. The model is hosted on Hugging Face by the user roberta-sgariglia
.
Architecture
The model employs the PPO algorithm, a popular method for reinforcement learning tasks. It is specifically tailored to operate within the LunarLander-v2 environment, a standard testbed for deep reinforcement learning strategies.
Training
The PPO agent was trained using the stable-baselines3
library. The performance metric for the model is the mean reward, which was recorded at 230.59 with a standard deviation of 38.03. These results indicate the agent's effectiveness in the LunarLander-v2 environment.
Guide: Running Locally
To run this model locally, follow these steps:
-
Install Dependencies: Ensure Python and pip are installed. Then, install the
stable-baselines3
library.pip install stable-baselines3
-
Load the Model: Use the
huggingface_sb3
utility to load the model from the Hugging Face model hub.from stable_baselines3 import PPO from huggingface_sb3 import load_from_hub model = load_from_hub("roberta-sgariglia/ppo-LunarLander-v2")
-
Run the Model: Utilize the model to interact with the LunarLander-v2 environment.
For optimal performance, consider using cloud-based GPUs from providers like AWS, GCP, or Azure.
License
The licensing information for this model is not provided. Users should check the Hugging Face repository for any specific licensing terms or requirements.