RL-CP Fusion Documentation

The SACAgent class provides a high-level wrapper for training and evaluating agents using the Conformal Soft Actor-Critic algorithm with offline datasets.

Initialization

Create a new SACAgent instance with the following parameters:

from conformal_sac.agent_wrapper import SACAgent

agent = SACAgent(
    env_name="halfcheetah-medium-expert",
    offline=True,
    iteration=100000,
    seed=42,
    learning_rate=3e-4,
    gamma=0.99,
    tau=0.005,
    batch_size=256,
    log_interval=2000,
    alpha_q=100,
    q_alpha_update_freq=50
)

Parameters

env_namestring
Name of the Gym (or D4RL) environment.
offlineboolean, default: True
If True, use an offline dataset from D4RL.
iterationinteger, default: 100000
Number of training iterations.

Methods

train()

Runs the training loop. During training, the agent's update method is called repeatedly. Evaluation is performed every log_interval steps.

# Train the agent
agent.train()

evaluate(eval_episodes: int = 5) → float

Evaluates the current policy on the environment.

# Evaluate the agent
score = agent.evaluate(eval_episodes=5)
print(f"Final evaluation score: {score}")

SACAgent Class

Initialization

Parameters

Methods

train()

evaluate(eval_episodes: int = 5) → float

ON THIS PAGE