Reinforcement learning-based optimal adaptive fuzzy control for nonlinear multi-agent systems with prescribed performance
Abstract
In this paper, the problem of optimal adaptive consensus tracking control for nonlinear multi-agent systems with prescribed performance is investigated. To address the issue of satisfying the initial value conditions in existing results, an improved performance function is employed as the prescribed performance boundary, effectively resolving this problem. Then, by employing the error transformation function, the constrained system is converted into an unconstrained one. Furthermore, fuzzy logic systems are employed to identify unknown system parts. By applying the dynamic surface technique, the problem of "differential explosion", which often occurs in backstepping, is solved. Moreover, a distributed optimal adaptive fuzzy control protocol based on the reinforcement learning actor-critic algorithm is proposed. Under the proposed control scheme, it is proved that all the signals within the closed-loop system are bounded, and the consensus tracking errors have remained within the predefined bounds. Finally, the numerical simulation results demonstrate the effectiveness of the proposed scheme.
Keywords
1. INTRODUCTION
Optimal control is achieved by designing a control protocol that not only achieves the system control objectives but also minimizes the system cost. The field of optimal control has garnered significant scholarly interest in recent years. The optimal controller can be deduced from the solution of the Hamilton-Jacobi-Bellman (HJB) equation [1]. For linear systems, this is actually solving the Riccati equation. However, for nonlinear systems, the HJB equation is a partial differential equation containing multiple nonlinear terms, and it is conceivably challenging to solve the equation directly. One approach that can be implemented is dynamic programming (DP) [2–5]. However, this approach becomes less feasible for high-dimensional systems since it is a backward, offline computational process, which significantly increases the computational complexity in high-dimensional scenarios. As a form of machine learning, reinforcement learning (RL) arguably opens up another avenue to solve the problem [6–9]. The most commonly used RL algorithms make use of the actor-critic structure, in which the actor interacts with the environment, and the critic evaluates the actions of the actor and provides feedback; in this way, the actor performs the next task again. Subsequently, RL has been employed in various nonlinear systems for adaptive control, leading to remarkable outcomes[10–14]. For example, an optimal adaptive controller for nonlinear systems with control gain functions was proposed to achieve not only tracking control but also the optimal performance of systems[10]. In[12], the problem of tracking control of nonlinear systems with input constraints was investigated. In[14], an optimal observer-based adaptive control scheme for nonlinear stochastic systems with input and state constraints was proposed.
In the meanwhile, since most physical models in practical applications can be represented by nonlinear systems, the study of nonlinear systems is very important and has yielded rich results[15–18]. In recent years, multi-agent systems (MASs) have garnered significant attention from scholars due to their capability to perform tasks that surpass the capabilities of a single agent. The consensus problem in MASs refers to achieving a state of agreement or convergence among multiple agents through the design of control protocols, which is a fundamental problem in the design and control of MASs. Over the past decades, the problem of consensus control for MASs has been extensively studied, leading to significant advancements[19–30]. In[23], a consensus control scheme incorporating a modified disturbance observer was designed to achieve fixed-time tracking control of nonlinear MASs with unknown disturbances. In[27], an event-triggered control distributed scheme was proposed to address the problem of asymptotic tracking for nonlinear MASs with uncertain leaders. In recent times, there has been a surge of interest in incorporating RL into MASs. It is an interesting and challenging problem and has produced some excellent results[31–38]. For instance, in[33], an optimal backstepping consensus control protocol based on RL was introduced for nonlinear strict-feedback MASs, which not only exhibits algorithmic simplicity but also relaxes the need for two general conditions: known dynamics and persistence excitation. In[34], an optimal RL-based event-triggered controller was proposed for nonlinear stochastic systems.
On the other hand, the concept of prescribed performance control (PPC) has emerged as a prominent research topic in the control community, initially proposed by Bechlioulis and Rovithakis[39]. Transient and steady-state performance, which is often neglected by conventional control schemes that solely ensure closed-loop stability, is the primary concern of prescribed performance. The PPC strategy aims to align the actual system performance achieved after execution with the desired or prescribed performance criteria and has yielded remarkable outcomes[40–43]. By utilizing exponential performance functions, both the nonlinear switched systems in[40] and the non-triangularly structured systems in [42] were able to achieve the desired rate of convergence. To overcome the issue of "differential explosion" caused by repeated derivation, dynamic surface control schemes were proposed to implement the tracking control of systems in [41] and [43], respectively. Recently, there has been significant research focused on integrating RL into PPC[44, 45]. However, it is noticed that all of the above results depend on the initial conditions; i.e., at the initial moment, the initial error needs to be made within a prescribed boundary by properly setting the initial values.
Motivated by the discussions above, this article focuses on the optimal adaptive consensus tracking control problem for leader-follower nonlinear MASs subject to prescribed performance constraints. The main contributions of the article are as follows.
(1) Based on the actor-critic structure of RL, the proposed consensus tracking control scheme can achieve optimal control of MASs while an excellent tracking effect is guaranteed. Compared with [10–12], the proposed algorithm is simpler to implement since it does not require system dynamic and persistent excitation conditions.
(2) In contrast to existing performance functions [40–43], most of which rely on initial value conditions, an improved performance function is introduced such that the proposed consensus tracking control scheme is able to force the convergence of the consensus tracking error to a prescribed region without the requirement of initial value conditions.
(3) Compared with the traditional backstepping control scheme [24, 25], the dynamic surface technique is adopted, which effectively avoids the problem of "differential explosion" caused by multiple derivations of the virtual controller and makes the control structure simpler.
2. PRELIMINARIES AND PROBLEM FORMULATION
2.1. Topology theory
In this paper, information interactions between agents are inscribed by a directed graph
2.2. System formulation
Consider the following nonlinear MAS composed of N agents, where the
where
where
Assumption 1[24] Let the leader be the root node and the directed graph have a spanning tree.
Assumption 2 The reference trajectory
Lemma 1[25] If a function
where
2.3. Error transformation
Define the following monotonically increasing function over the interval (-1, 1)
where
Define the monotonically increasing function
(1)
(2)
To achieve a desired level of system performance, we define the following performance function
where
Define the following normalized function
From the above equation, we know that there exists a constant
The following error transformation function is introduced
where
From the above definition, it has
Since
Noting that
3. DESIGN PROCEDURE AND MAIN RESULTS
The controller design is conducted based on the following coordinate transformations
where
where
To facilitate brevity, the following definitions are provided before the commencement of the design steps. For
where
Step 1: Derivation with respect to
Define the performance index function for the first subsystem of the agent
where
The optimal performance index function is expressed as
Considering
where
Define the Bellman residual as
The approximate optimal virtual controller is expected to guarantee that
Since equation
Thus, the designed adaptive laws
By calculating
To obtain the optimal virtual controller, we decompose
where
Substituting equation (24) into equations (23), it follows that
Since
where
Substituting equation (27) into equations (24) and (26) results in
The
Construct the following Lyapunov candidate function
The derivation of
Define
By Lemma 2, there exists an FLS approximation to
where
With the help of Young's inequality, one has
Substituting equations (30), (36), (37), and (38) into equation (33) yields
According to Young's inequality, it can be derived that
By means of equations (40)-(44),
Then, it can be further bounded as
where
Step j
The performance index function is chosen to be
where
Considering
The HJB equation can be written as
By calculating
Then, decompose
where
From equation (52), it is easy to introduce
Since
where
According to the above equation, we can get
The
The candidate Lyapunov function function is chosen as
Derivation of
From equations (10) and (11), it holds that
Define
By Lemma 2, there exists an FLS satisfying
where
With the help of Young's inequality, the following inequality holds
Similar to the first step, it has
From equations (58), (62), and (64)-(72),
where
where
The optimal performance index function is
The HJB equation is introduced as
By calculating
where
Then, the facts below are easily available
The
Candidate Lyapunov function is chosen to be the
The time derivative of
By Lemma 2, there exists an FLS satisfying
where
Similar to the previous steps,
where
Define
Theorem 1: Consider a nonlinear MAS under Assumptions 1-2, with the optimal virtual controller choice of equations (30), (58), and (83), the adaptive laws of equations (14) and (15), the actor FLS choice of equation (12), and the critic FLS choice of (13). Then, we select the parameter
(1) All signals in the closed-loop system are bounded.
(2) Consensus tracking error is within predefined bounds.
Proof: The total Liapunov function for all agents is selected to be
The derivative of V with respect to time is
where
Obviously, it can be inferred that
From (91), we know that
This means that the consensus tracking error can be bounded within prescribed bounds, i.e.,
On the other hand, it is known from equation (92) that
4. SIMULATION EXAMPLES
Consider the nonlinear MASs with four following agents and a leader, whose dynamics model is represented as
where
Select the fuzzy membership function as
The time-varying function
Figure 2. The trajectories of the system output state
Figure 3. System consensus tracking error
5. CONCLUSIONS
In this paper, the problem of optimal adaptive consensus tracking control for nonlinear MASs with prescribed performance has been addressed. Firstly, a time-varying scalar function is introduced such that the designed performance function bypasses the initial value conditions. Based on the error transformation function, an unconstrained system is obtained. Subsequently, a RL-based consensus control scheme based on optimal control theory and dynamic surface technique has been proposed. Finally, it is shown that the stability of the closed-loop system and the error constraints are not violated. In practice, the systems are always subject to various uncertain constraints, such as actuator faults and input dead zones, which will have a large impact on the performance of systems. Therefore, designing a properly performance-constrained optimal control scheme considering the above situations is a topic for further research in the future.
DECLARATIONS
Authors' contributions
Made substantial contributions to the conception and design of the study and performed data analysis and interpretation: Yue H
Performed data acquisition and provided administrative, technical, and material support: Xia J
Availability of data and materials
Not applicable.
Financial support and sponsorship
This work was supported by the National Natural Science Foundation of China under Grants 61973148 and by the Discipline with Strong Characteristics of Liaocheng University: Intelligent Science and Technology under Grant 319462208.
Conflicts of interest
All authors declared that there are no conflicts of interest.
Ethical approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Copyright
© The Author(s) 2023.
REFERENCES
1. Modares H, Lewis FL, Naghibi-Sistani MB. Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 2014;50:193-202.
2. Bertsekas DP. Value and policy iterations in optimal control and adaptive dynamic programming. IEEE Trans Neural Netw Learn Syst 2015;28:500-9.
3. Tsai JSH, Li JS, Leang-San S. Discretized quadratic optimal control for continuous-time two-dimensional systems. IEEE Trans Circuits Syst I Fund Theory Appl 2002;49:116-25.
4. Luo B, Liu DR, Wu HN, Wang D, Lewis FL. Policy gradient adaptive dynamic programming for data-based optimal control. IEEE Trans Cybern 2016;47:3341-54.
5. Jiang Y, Jiang ZP. Global adaptive dynamic programming for continuous-time nonlinear systems. IEEE Trans Automat Contr 2015;60:2917-29.
6. Wu X, Chen HL, Wang JJ, Troiano L, Loia V, Fujita H. Adaptive stock trading strategies with deep reinforcement learning methods. Inf Sci 2020;538:142-58.
7. Modares H, Ranatunga I, Lewis FL, Popa DO. Optimized assistive human-robot interaction using reinforcement learning. IEEE Trans Cybern 2015;46:655-67.
8. Wen GX, Chen CLP, Li WN. Simplified optimized control using reinforcement learning algorithm for a class of stochastic nonlinear systems. Inf Sci 2020;517:230-43.
9. Zhao B, Liu DR, Luo CM. Reinforcement learning-based optimal stabilization for unknown nonlinear systems subject to inputs with uncertain constraints. IEEE Trans Neural Netw Learn Syst 2019;31:4330-40.
10. Wen G, Chen CLP, Ge SS, Yang H, Liu X. Optimized adaptive nonlinear tracking control using actor-critic reinforcement learning strategy. IEEE Trans Ind Inf 2019;15:4969-77.
11. Bai W, Zhou Q, Li T, Li H. Adaptive reinforcement learning neural network control for uncertain nonlinear system with input saturation. IEEE Trans Cybern 2019;50:3433-43.
12. Yang X, Liu D, Wang D. Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints. Int J Control 2014;87:553-66.
13. Bai W, Li T, Tong S. NN reinforcement learning adaptive control for a class of nonstrict-feedback discrete-time systems. IEEE Trans Cybern 2020;50:4573-84.
14. Li Y, Zhang J, Liu W, Tong S. Observer-based adaptive optimized control for stochastic nonlinear systems with input and state constraints. IEEE Trans Neural Netw Learn Syst 2021;33:7791-805.
15. Wang J, Gong Q, Huang K, Liu Z, Chen CLP, Liu J. Event-triggered prescribed settling time consensus compensation control for a class of uncertain nonlinear systems with actuator failures. IEEE Trans Neural Netw Learn Syst 2023;34:5590-600.
16. Wang J, Wang C, Liu Z, Chen CLP, Zhang C. Practical fixed-time adaptive ERBFNNs event-triggered control for uncertain nonlinear systems with dead-zone constraint. IEEE Trans Syst Man Cybern Syst 2023:1-10.
17. Cai X, de Marcio M. Adaptive rigidity-based formation control for multirobotic vehicles with dynamics. IEEE Trans Contr Syst Technol 2014;23:389-96.
18. Ren H, Cheng Z, Qin J, Lu R. Deception attacks on event-triggered distributed consensus estimation for nonlinear systems. Automatica 2023;154:111100.
19. Wang J, Yan Y, Liu Z, Chen CLP, Zhang C, Chen K. Finite-time consensus control for multi-agent systems with full-state constraints and actuator failures. Neural Netw 2023;157:350-63.
20. Shang Y. Matrix-scaled consensus on weighted networks with state constraints. IEEE Syst J 2023;17:6472-9.
21. Cheng L, Hou ZG, Tan M, Lin Y, Zhang W. Neural-network-based adaptive leader-following control for multiagent systems with uncertainties. IEEE Trans Neural Netw 2010;21:1351-8.
22. Shen Q, Shi P, Zhu J, Wang S, Shi Y. Neural networks-based distributed adaptive control of nonlinear multiagent systems. IEEE Trans Neural Netw Learn Syst 2019;31:1010-21.
23. Zhang N, Xia J, Park JH, Zhang J, Shen H. Improved disturbance observer-based fixed-time adaptive neural network consensus tracking for nonlinear multi-agent systems. Neural Netw 2023;162:490-501.
24. Zhang Y, Sun J, Liang H, Li H. Event-triggered adaptive tracking control for multiagent systems with unknown disturbances. IEEE Trans Cybern 2018;50:890-901.
25. Chen J, Li J, Yuan X. Global fuzzy adaptive consensus control of unknown nonlinear multiagent systems. IEEE Trans Fuzzy Syst 2020;32:2239-50.
26. Zhang J, Liu S, Zhang X, Xia J. Event-triggered-based distributed consensus tracking for nonlinear multiagent systems with quantization. IEEE Trans Neural Netw Learn Syst 2022:1-11.
27. Deng C, Wen C, Wang W, Li X, Yue D. Distributed adaptive tracking control for high-order nonlinear multiagent systems over event-triggered communication. IEEE Trans Automat Contr 2022;68:1176-83.
28. Shao J, Shi L, Cheng Y, Li T. Asynchronous tracking control of leader--follower multiagent systems with input uncertainties over switching signed digraphs. IEEE Trans Cybern 2021;52:6379-90.
29. Yang Y, Xiao Y, Li T. Attacks on formation control for multiagent systems. IEEE Trans Cybern 2021;52:12805-17.
30. Ren H, Wang Y, Liu M, Li H. An optimal estimation framework of multi-agent systems with random transport protocol. IEEE Trans Signal Process 2022;70:2548-59.
31. Gao W, Jiang ZP, Lewis FL, Wang Y. Leader-to-formation stability of multiagent systems): An adaptive optimal control approach. IEEE Trans Automat Contr 2018;63:3581-87.
32. Tan M, Liu Z, Chen CLP, Zhang Y, Wu Z. Optimized adaptive consensus tracking control for uncertain nonlinear multiagent systems using a new event-triggered communication mechanism. Inf Sci 2022;605:301-16.
33. Wen G, Chen CLP. Optimized backstepping consensus control using reinforcement learning for a class of nonlinear strict-feedback-dynamic multi-agent systems. IEEE Trans Neural Netw Learn Syst 2023;34:1524-36.
34. Zhu HY, Li YX, Tong S. Dynamic event-triggered reinforcement learning control of stochastic nonlinear systems. IEEE Trans Fuzzy Syst 2023;31:2917-28.
35. Bai W, Li T, Long Y, Chen CLP. Event-triggered multigradient recursive reinforcement learning tracking control for multiagent systems. IEEE Trans Neural Netw Learn Syst 2023;34:366-79.
36. Li T, Bai W, Liu Q, Long Y, Chen CLP. Distributed fault-tolerant containment control protocols for the discrete-time multi-agent systems via reinforcement learning method. IEEE Trans Neural Netw Learn Syst 2023;34:3979-91.
37. Zhao Y, Niu B, Zong G, Zhao X, Alharbi KH. Neural network-based adaptive optimal containment control for non-affine nonlinear multi-agent systems within an identifier-actor-critic framework. J Franklin Inst 2023;360:8118-43.
38. Li H, Wu Y, Chen M, Lu R. Adaptive multigradient recursive reinforcement learning event-triggered tracking control for multiagent systems. IEEE Trans Neural Netw Learn Syst 2023;34:144-56.
39. Bechlioulis CP, Rovithakis GA. Robust adaptive control of feedback linearizable MIMO nonlinear systems with prescribed performance. IEEE Trans Automat Contr 2008;53:2090-9.
40. Wang X, Xia J, Park JH, Xie X, Chen G. Intelligent control of performance constrained switched nonlinear systems with random noises and its application): an event-driven approach. IEEE Trans Circuits Syst I Regul Pap 2022;69:3736-47.
41. Li Y, Shao X, Tong S. Adaptive fuzzy prescribed performance control of nontriangular structure nonlinear systems. IEEE Trans Fuzzy Syst 2019;28:2416-26.
42. Wang W, Liang H, Pan Y, Li T. Prescribed performance adaptive fuzzy containment control for nonlinear multiagent systems using disturbance observer. IEEE Trans Cybern 2020;50:3879-91.
43. Sun K, Qiu J, Karimi HR, Fu Y. Event-triggered robust fuzzy adaptive finite-time control of nonlinear systems with prescribed performance. IEEE Trans Fuzzy Syst 2020;29:1460-71.
44. Chen H, Yan H, Wang Y, Xie S, Zhang D. Reinforcement learning-based close formation control for underactuated surface vehicle with prescribed performance and time-varying state constraints. Ocean Eng 2022;256:111361.
Cite This Article
How to Cite
Yue, H.; Xia J. Reinforcement learning-based optimal adaptive fuzzy control for nonlinear multi-agent systems with prescribed performance. Complex Eng. Syst. 2023, 3, 19. http://dx.doi.org/10.20517/ces.2023.27
Download Citation
Export Citation File:
Type of Import
Tips on Downloading Citation
Citation Manager File Format
Type of Import
Direct Import: When the Direct Import option is selected (the default state), a dialogue box will give you the option to Save or Open the downloaded citation data. Choosing Open will either launch your citation manager or give you a choice of applications with which to use the metadata. The Save option saves the file locally for later use.
Indirect Import: When the Indirect Import option is selected, the metadata is displayed and may be copied and pasted as needed.
Comments
Comments must be written in English. Spam, offensive content, impersonation, and private information will not be permitted. If any comment is reported and identified as inappropriate content by OAE staff, the comment will be removed without notice. If you have any queries or need any help, please contact us at support@oaepublish.com.