RL-CP Fusion Documentation

This guide will help you train your first agent using RL-CP Fusion with a D4RL dataset.

Prerequisites

Before you begin, make sure you have:

Python 3.7 or higher installed
pip package manager
Basic understanding of reinforcement learning concepts

Installation

First, install the required packages:

pip install torch gym d4rl numpy tensorboardX

Training Your First Agent

Here's a complete example to train an agent on the HalfCheetah environment:

from conformal_sac.agent_wrapper import SACAgent

# Initialize the agent
agent = SACAgent(
    env_name="halfcheetah-medium-v2",
    offline=True,
    iteration=100000,
    seed=42,
    learning_rate=3e-4,
    gamma=0.99,
    tau=0.005,
    batch_size=256,
    log_interval=2000,
    alpha_q=100,
    q_alpha_update_freq=50
)

# Train the agent
agent.train()

# Evaluate the trained agent
score = agent.evaluate(eval_episodes=10)
print(f"Final evaluation score: {score}")

Understanding the Code

Let's break down what's happening in the code above:

1. Agent Initialization

We create a new SACAgent instance with specific hyperparameters:

env_name: The D4RL environment to use
offline: Set to True for offline learning
iteration: Number of training iterations
seed: Random seed for reproducibility
Various hyperparameters for the SAC algorithm

2. Training

The train() method:

Loads the offline dataset
Performs training iterations
Updates the agent's policy
Periodically evaluates performance

3. Evaluation

The evaluate() method runs the trained policy for multiple episodes and returns the average reward.

Monitoring Training

You can monitor the training progress using TensorBoard:

tensorboard --logdir ./exp-SAC_dual_Q_network

Next Steps

Now that you've trained your first agent, you can:

Learn about Conformal Prediction
Understand Offline Learning
Explore the API Reference

Quickstart