The main interface for working with the Conformal SAC agent.
The SACAgent class provides a high-level wrapper for training and evaluating agents using the Conformal Soft Actor-Critic algorithm with offline datasets.
Create a new SACAgent instance with the following parameters:
from conformal_sac.agent_wrapper import SACAgent
agent = SACAgent(
env_name="halfcheetah-medium-expert",
offline=True,
iteration=100000,
seed=42,
learning_rate=3e-4,
gamma=0.99,
tau=0.005,
batch_size=256,
log_interval=2000,
alpha_q=100,
q_alpha_update_freq=50
)env_namestringName of the Gym (or D4RL) environment.
offlineboolean, default: TrueIf True, use an offline dataset from D4RL.
iterationinteger, default: 100000Number of training iterations.
Runs the training loop. During training, the agent's update method is called repeatedly. Evaluation is performed every log_interval steps.
# Train the agent
agent.train()Evaluates the current policy on the environment.
# Evaluate the agent
score = agent.evaluate(eval_episodes=5)
print(f"Final evaluation score: {score}")