fig12
Figure 12. Operation in the heating scenario for 120 h of CNN-MTD3 (A), TD3base (B) and MPC (C). CNN-MTD3: A deep reinforcement learning algorithm proposed in this paper; CNN: convolutional neural network; MTD3: multi agent of twin delayed deep deterministic policy gradient method; TD3base: this primary scenario does not employ load clustering and does not incorporate expert knowledge into the reward function; MPC: operational optimization using model predictive control algorithms; HP: heat pump; BESS: battery electricity energy storage; PV: photovoltaic; GE: gas engine; TESS: thermal energy storage system.



