API reference for configuring Conformal Q-Learning
This page provides details on the configuration options available for the Conformal Q-Learning algorithm.
env_namestringName of the Gym (or D4RL) environment. Example: "halfcheetah-medium-expert"
offlineboolean, default: TrueIf True, use an offline dataset from D4RL. If False, interact with the environment during training.
iterationinteger, default: 100000Number of training iterations.
seedinteger, default: 1Random seed for reproducibility.
learning_ratefloat, default: 3e-4Learning rate for the optimizer.
gammafloat, default: 0.99Discount factor for future rewards.
taufloat, default: 0.005Soft update coefficient for target networks.
batch_sizeinteger, default: 256Batch size for training.
log_intervalinteger, default: 2000Interval for logging and evaluation.
alpha_qfloat, default: 100Coefficient for the conformal regularization term.
q_alpha_update_freqinteger, default: 50Frequency of updating the conformal threshold.
from conformal_sac.agent_wrapper import SACAgent
agent = SACAgent(
env_name="halfcheetah-medium-expert",
offline=True,
iteration=100000,
seed=42,
learning_rate=3e-4,
gamma=0.99,
tau=0.005,
batch_size=256,
log_interval=2000,
alpha_q=100,
q_alpha_update_freq=50
)
For more advanced use cases, you can create a configuration dictionary and pass it to the SACAgent constructor:
config = {
"env_name": "halfcheetah-medium-expert",
"offline": True,
"iteration": 100000,
"seed": 42,
"learning_rate": 3e-4,
"gamma": 0.99,
"tau": 0.005,
"batch_size": 256,
"log_interval": 2000,
"alpha_q": 100,
"q_alpha_update_freq": 50,
"hidden_sizes": [256, 256], # Custom neural network architecture
"activation": "relu", # Activation function for hidden layers
"optimizer": "adam" # Optimizer type
}
agent = SACAgent(**config)
For more information on the core algorithm, see the Conformal Q-Learning page.