Configuration

API reference for configuring Conformal Q-Learning

This page provides details on the configuration options available for the Conformal Q-Learning algorithm.

Configuration Options

SACAgent Parameters

  • env_namestring

    Name of the Gym (or D4RL) environment. Example: "halfcheetah-medium-expert"

  • offlineboolean, default: True

    If True, use an offline dataset from D4RL. If False, interact with the environment during training.

  • iterationinteger, default: 100000

    Number of training iterations.

  • seedinteger, default: 1

    Random seed for reproducibility.

SAC Hyperparameters

  • learning_ratefloat, default: 3e-4

    Learning rate for the optimizer.

  • gammafloat, default: 0.99

    Discount factor for future rewards.

  • taufloat, default: 0.005

    Soft update coefficient for target networks.

  • batch_sizeinteger, default: 256

    Batch size for training.

  • log_intervalinteger, default: 2000

    Interval for logging and evaluation.

Conformal Prediction Parameters

  • alpha_qfloat, default: 100

    Coefficient for the conformal regularization term.

  • q_alpha_update_freqinteger, default: 50

    Frequency of updating the conformal threshold.

Example Configuration

from conformal_sac.agent_wrapper import SACAgent

agent = SACAgent(
    env_name="halfcheetah-medium-expert",
    offline=True,
    iteration=100000,
    seed=42,
    learning_rate=3e-4,
    gamma=0.99,
    tau=0.005,
    batch_size=256,
    log_interval=2000,
    alpha_q=100,
    q_alpha_update_freq=50
)

Advanced Configuration

For more advanced use cases, you can create a configuration dictionary and pass it to the SACAgent constructor:

config = {
    "env_name": "halfcheetah-medium-expert",
    "offline": True,
    "iteration": 100000,
    "seed": 42,
    "learning_rate": 3e-4,
    "gamma": 0.99,
    "tau": 0.005,
    "batch_size": 256,
    "log_interval": 2000,
    "alpha_q": 100,
    "q_alpha_update_freq": 50,
    "hidden_sizes": [256, 256],  # Custom neural network architecture
    "activation": "relu",        # Activation function for hidden layers
    "optimizer": "adam"          # Optimizer type
}

agent = SACAgent(**config)

For more information on the core algorithm, see the Conformal Q-Learning page.