Quickstart

Get started with RL-CP Fusion in minutes

This guide will help you train your first agent using RL-CP Fusion with a D4RL dataset.

Prerequisites

Before you begin, make sure you have:

  • Python 3.7 or higher installed
  • pip package manager
  • Basic understanding of reinforcement learning concepts

Installation

First, install the required packages:

pip install torch gym d4rl numpy tensorboardX

Training Your First Agent

Here's a complete example to train an agent on the HalfCheetah environment:

from conformal_sac.agent_wrapper import SACAgent

# Initialize the agent
agent = SACAgent(
    env_name="halfcheetah-medium-v2",
    offline=True,
    iteration=100000,
    seed=42,
    learning_rate=3e-4,
    gamma=0.99,
    tau=0.005,
    batch_size=256,
    log_interval=2000,
    alpha_q=100,
    q_alpha_update_freq=50
)

# Train the agent
agent.train()

# Evaluate the trained agent
score = agent.evaluate(eval_episodes=10)
print(f"Final evaluation score: {score}")

Understanding the Code

Let's break down what's happening in the code above:

1. Agent Initialization

We create a new SACAgent instance with specific hyperparameters:

  • env_name: The D4RL environment to use
  • offline: Set to True for offline learning
  • iteration: Number of training iterations
  • seed: Random seed for reproducibility
  • Various hyperparameters for the SAC algorithm

2. Training

The train() method:

  • Loads the offline dataset
  • Performs training iterations
  • Updates the agent's policy
  • Periodically evaluates performance

3. Evaluation

The evaluate() method runs the trained policy for multiple episodes and returns the average reward.

Monitoring Training

You can monitor the training progress using TensorBoard:

tensorboard --logdir ./exp-SAC_dual_Q_network

Next Steps

Now that you've trained your first agent, you can: