Figure8

Figure 8. Comparison of the adaptivity of RL-based PID auto-tuning (our approach), gain-scheduling PID, and the fixed baseline PID controller to the process gain change. Left: episodic rewards (the negative of mean squared tracking error) for these methods; Right: The ultimate CV profiles after the PID adapts to the new process gain based on these methods. Note that that green color shows the results where a fixed baseline PID from Case 2 is used throughout the entire experiment.