RL-CP Fusion Documentation

This page provides details on the configuration options available for the Conformal Q-Learning algorithm.

Configuration Options

SACAgent Parameters

env_namestring
Name of the Gym (or D4RL) environment. Example: "halfcheetah-medium-expert"
offlineboolean, default: True
If True, use an offline dataset from D4RL. If False, interact with the environment during training.
iterationinteger, default: 100000
Number of training iterations.
seedinteger, default: 1
Random seed for reproducibility.

SAC Hyperparameters

learning_ratefloat, default: 3e-4
Learning rate for the optimizer.
gammafloat, default: 0.99
Discount factor for future rewards.
taufloat, default: 0.005
Soft update coefficient for target networks.
batch_sizeinteger, default: 256
Batch size for training.
log_intervalinteger, default: 2000
Interval for logging and evaluation.

Conformal Prediction Parameters

alpha_qfloat, default: 100
Coefficient for the conformal regularization term.
q_alpha_update_freqinteger, default: 50
Frequency of updating the conformal threshold.

Example Configuration

from conformal_sac.agent_wrapper import SACAgent

agent = SACAgent(
    env_name="halfcheetah-medium-expert",
    offline=True,
    iteration=100000,
    seed=42,
    learning_rate=3e-4,
    gamma=0.99,
    tau=0.005,
    batch_size=256,
    log_interval=2000,
    alpha_q=100,
    q_alpha_update_freq=50
)

Advanced Configuration

For more advanced use cases, you can create a configuration dictionary and pass it to the SACAgent constructor:

config = {
    "env_name": "halfcheetah-medium-expert",
    "offline": True,
    "iteration": 100000,
    "seed": 42,
    "learning_rate": 3e-4,
    "gamma": 0.99,
    "tau": 0.005,
    "batch_size": 256,
    "log_interval": 2000,
    "alpha_q": 100,
    "q_alpha_update_freq": 50,
    "hidden_sizes": [256, 256],  # Custom neural network architecture
    "activation": "relu",        # Activation function for hidden layers
    "optimizer": "adam"          # Optimizer type
}

agent = SACAgent(**config)

For more information on the core algorithm, see the Conformal Q-Learning page.