# Reinforcement learning-based optimal adaptive fuzzy control for nonlinear multi-agent systems with prescribed performance

*Complex Eng Syst*2023;3:19.

## Abstract

In this paper, the problem of optimal adaptive consensus tracking control for nonlinear multi-agent systems with prescribed performance is investigated. To address the issue of satisfying the initial value conditions in existing results, an improved performance function is employed as the prescribed performance boundary, effectively resolving this problem. Then, by employing the error transformation function, the constrained system is converted into an unconstrained one. Furthermore, fuzzy logic systems are employed to identify unknown system parts. By applying the dynamic surface technique, the problem of "differential explosion", which often occurs in backstepping, is solved. Moreover, a distributed optimal adaptive fuzzy control protocol based on the reinforcement learning actor-critic algorithm is proposed. Under the proposed control scheme, it is proved that all the signals within the closed-loop system are bounded, and the consensus tracking errors have remained within the predefined bounds. Finally, the numerical simulation results demonstrate the effectiveness of the proposed scheme.

## Keywords

*,*reinforcement learning

*,*prescribed performance

*,*optimal consensus control

*,*actor-critic structure

## 1. INTRODUCTION

Optimal control is achieved by designing a control protocol that not only achieves the system control objectives but also minimizes the system cost. The field of optimal control has garnered significant scholarly interest in recent years. The optimal controller can be deduced from the solution of the Hamilton-Jacobi-Bellman (HJB) equation ^{[1]}. For linear systems, this is actually solving the Riccati equation. However, for nonlinear systems, the HJB equation is a partial differential equation containing multiple nonlinear terms, and it is conceivably challenging to solve the equation directly. One approach that can be implemented is dynamic programming (DP) ^{[2–5]}. However, this approach becomes less feasible for high-dimensional systems since it is a backward, offline computational process, which significantly increases the computational complexity in high-dimensional scenarios. As a form of machine learning, reinforcement learning (RL) arguably opens up another avenue to solve the problem ^{[6–9]}. The most commonly used RL algorithms make use of the actor-critic structure, in which the actor interacts with the environment, and the critic evaluates the actions of the actor and provides feedback; in this way, the actor performs the next task again. Subsequently, RL has been employed in various nonlinear systems for adaptive control, leading to remarkable outcomes^{[10–14]}. For example, an optimal adaptive controller for nonlinear systems with control gain functions was proposed to achieve not only tracking control but also the optimal performance of systems^{[10]}. In^{[12]}, the problem of tracking control of nonlinear systems with input constraints was investigated. In^{[14]}, an optimal observer-based adaptive control scheme for nonlinear stochastic systems with input and state constraints was proposed.

In the meanwhile, since most physical models in practical applications can be represented by nonlinear systems, the study of nonlinear systems is very important and has yielded rich results^{[15–18]}. In recent years, multi-agent systems (MASs) have garnered significant attention from scholars due to their capability to perform tasks that surpass the capabilities of a single agent. The consensus problem in MASs refers to achieving a state of agreement or convergence among multiple agents through the design of control protocols, which is a fundamental problem in the design and control of MASs. Over the past decades, the problem of consensus control for MASs has been extensively studied, leading to significant advancements^{[19–30]}. In^{[23]}, a consensus control scheme incorporating a modified disturbance observer was designed to achieve fixed-time tracking control of nonlinear MASs with unknown disturbances. In^{[27]}, an event-triggered control distributed scheme was proposed to address the problem of asymptotic tracking for nonlinear MASs with uncertain leaders. In recent times, there has been a surge of interest in incorporating RL into MASs. It is an interesting and challenging problem and has produced some excellent results^{[31–38]}. For instance, in^{[33]}, an optimal backstepping consensus control protocol based on RL was introduced for nonlinear strict-feedback MASs, which not only exhibits algorithmic simplicity but also relaxes the need for two general conditions: known dynamics and persistence excitation. In^{[34]}, an optimal RL-based event-triggered controller was proposed for nonlinear stochastic systems.

On the other hand, the concept of prescribed performance control (PPC) has emerged as a prominent research topic in the control community, initially proposed by Bechlioulis and Rovithakis^{[39]}. Transient and steady-state performance, which is often neglected by conventional control schemes that solely ensure closed-loop stability, is the primary concern of prescribed performance. The PPC strategy aims to align the actual system performance achieved after execution with the desired or prescribed performance criteria and has yielded remarkable outcomes^{[40–43]}. By utilizing exponential performance functions, both the nonlinear switched systems in^{[40]} and the non-triangularly structured systems in ^{[42]} were able to achieve the desired rate of convergence. To overcome the issue of "differential explosion" caused by repeated derivation, dynamic surface control schemes were proposed to implement the tracking control of systems in ^{[41]} and ^{[43]}, respectively. Recently, there has been significant research focused on integrating RL into PPC^{[44, 45]}. However, it is noticed that all of the above results depend on the initial conditions; i.e., at the initial moment, the initial error needs to be made within a prescribed boundary by properly setting the initial values.

Motivated by the discussions above, this article focuses on the optimal adaptive consensus tracking control problem for leader-follower nonlinear MASs subject to prescribed performance constraints. The main contributions of the article are as follows.

(1) Based on the actor-critic structure of RL, the proposed consensus tracking control scheme can achieve optimal control of MASs while an excellent tracking effect is guaranteed. Compared with ^{[10–12]}, the proposed algorithm is simpler to implement since it does not require system dynamic and persistent excitation conditions.

(2) In contrast to existing performance functions ^{[40–43]}, most of which rely on initial value conditions, an improved performance function is introduced such that the proposed consensus tracking control scheme is able to force the convergence of the consensus tracking error to a prescribed region without the requirement of initial value conditions.

(3) Compared with the traditional backstepping control scheme ^{[24, 25]}, the dynamic surface technique is adopted, which effectively avoids the problem of "differential explosion" caused by multiple derivations of the virtual controller and makes the control structure simpler.

## 2. PRELIMINARIES AND PROBLEM FORMULATION

### 2.1. Topology theory

In this paper, information interactions between agents are inscribed by a directed graph

### 2.2. System formulation

Consider the following nonlinear MAS composed of N agents, where the

where

where

**Assumption 1**^{[24]} Let the leader be the root node and the directed graph have a spanning tree.

**Assumption 2** The reference trajectory

**Lemma 1**^{[25]} If a function

where

### 2.3. Error transformation

Define the following monotonically increasing function over the interval (-1, 1)

where

Define the monotonically increasing function

(1)

(2)

To achieve a desired level of system performance, we define the following performance function

where

Define the following normalized function

From the above equation, we know that there exists a constant

The following error transformation function is introduced

where

From the above definition, it has

Since

Noting that

## 3. DESIGN PROCEDURE AND MAIN RESULTS

The controller design is conducted based on the following coordinate transformations

where

where

To facilitate brevity, the following definitions are provided before the commencement of the design steps. For

where

**Step 1:** Derivation with respect to

Define the performance index function for the first subsystem of the agent

where

The optimal performance index function is expressed as

Considering

where

Define the Bellman residual as

The approximate optimal virtual controller is expected to guarantee that

Since equation

Thus, the designed adaptive laws

By calculating

To obtain the optimal virtual controller, we decompose

where

Substituting equation (24) into equations (23), it follows that

Since

where

Substituting equation (27) into equations (24) and (26) results in

The

Construct the following Lyapunov candidate function

The derivation of

Define

By Lemma 2, there exists an FLS approximation to

where

With the help of Young's inequality, one has

Substituting equations (30), (36), (37), and (38) into equation (33) yields

According to Young's inequality, it can be derived that

By means of equations (40)-(44),

Then, it can be further bounded as

where

**Step j** From equations (9) and (10), it holds that

The performance index function is chosen to be

where

Considering

The HJB equation can be written as

By calculating

Then, decompose

where

From equation (52), it is easy to introduce

Since

where

According to the above equation, we can get

The

The candidate Lyapunov function function is chosen as

Derivation of

From equations (10) and (11), it holds that

Define

By Lemma 2, there exists an FLS satisfying

where

With the help of Young's inequality, the following inequality holds

Similar to the first step, it has

From equations (58), (62), and (64)-(72),

where **Step n:** The performance index function for the last subsystem is defined as

where

The optimal performance index function is

The HJB equation is introduced as

By calculating

where

Then, the facts below are easily available

The

Candidate Lyapunov function is chosen to be the

The time derivative of

By Lemma 2, there exists an FLS satisfying

where

Similar to the previous steps,

where

Define

*Theorem 1:* Consider a nonlinear MAS under Assumptions 1-2, with the optimal virtual controller choice of equations (30), (58), and (83), the adaptive laws of equations (14) and (15), the actor FLS choice of equation (12), and the critic FLS choice of (13). Then, we select the parameter

(1) All signals in the closed-loop system are bounded.

(2) Consensus tracking error is within predefined bounds.

*Proof:* The total Liapunov function for all agents is selected to be

The derivative of V with respect to time is

where

Obviously, it can be inferred that

From (91), we know that

This means that the consensus tracking error can be bounded within prescribed bounds, i.e.,

On the other hand, it is known from equation (92) that

## 4. SIMULATION EXAMPLES

Consider the nonlinear MASs with four following agents and a leader, whose dynamics model is represented as

where

Select the fuzzy membership function as

The time-varying function

Figure 2. The trajectories of the system output state

Figure 3. System consensus tracking error

## 5. CONCLUSIONS

In this paper, the problem of optimal adaptive consensus tracking control for nonlinear MASs with prescribed performance has been addressed. Firstly, a time-varying scalar function is introduced such that the designed performance function bypasses the initial value conditions. Based on the error transformation function, an unconstrained system is obtained. Subsequently, a RL-based consensus control scheme based on optimal control theory and dynamic surface technique has been proposed. Finally, it is shown that the stability of the closed-loop system and the error constraints are not violated. In practice, the systems are always subject to various uncertain constraints, such as actuator faults and input dead zones, which will have a large impact on the performance of systems. Therefore, designing a properly performance-constrained optimal control scheme considering the above situations is a topic for further research in the future.

## DECLARATIONS

### Authors' contributions

Made substantial contributions to the conception and design of the study and performed data analysis and interpretation: Yue H

Performed data acquisition and provided administrative, technical, and material support: Xia J

### Availability of data and materials

Not applicable.

### Financial support and sponsorship

This work was supported by the National Natural Science Foundation of China under Grants 61973148 and by the Discipline with Strong Characteristics of Liaocheng University: Intelligent Science and Technology under Grant 319462208.

### Conflicts of interest

All authors declared that there are no conflicts of interest.

### Ethical approval and consent to participate

Not applicable.

### Consent for publication

Not applicable.

### Copyright

© The Author(s) 2023.

## REFERENCES

1. Modares H, Lewis FL, Naghibi-Sistani MB. Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. *Automatica* 2014;50:193-202.

2. Bertsekas DP. Value and policy iterations in optimal control and adaptive dynamic programming. *IEEE Trans Neural Netw Learn Syst* 2015;28:500-9.

3. Tsai JSH, Li JS, Leang-San S. Discretized quadratic optimal control for continuous-time two-dimensional systems. *IEEE Trans Circuits Syst I Fund Theory Appl* 2002;49:116-25.

4. Luo B, Liu DR, Wu HN, Wang D, Lewis FL. Policy gradient adaptive dynamic programming for data-based optimal control. *IEEE Trans Cybern* 2016;47:3341-54.

5. Jiang Y, Jiang ZP. Global adaptive dynamic programming for continuous-time nonlinear systems. *IEEE Trans Automat Contr* 2015;60:2917-29.

6. Wu X, Chen HL, Wang JJ, Troiano L, Loia V, Fujita H. Adaptive stock trading strategies with deep reinforcement learning methods. *Inf Sci* 2020;538:142-58.

7. Modares H, Ranatunga I, Lewis FL, Popa DO. Optimized assistive human-robot interaction using reinforcement learning. *IEEE Trans Cybern* 2015;46:655-67.

8. Wen GX, Chen CLP, Li WN. Simplified optimized control using reinforcement learning algorithm for a class of stochastic nonlinear systems. *Inf Sci* 2020;517:230-43.

9. Zhao B, Liu DR, Luo CM. Reinforcement learning-based optimal stabilization for unknown nonlinear systems subject to inputs with uncertain constraints. *IEEE Trans Neural Netw Learn Syst* 2019;31:4330-40.

10. Wen G, Chen CLP, Ge SS, Yang H, Liu X. Optimized adaptive nonlinear tracking control using actor-critic reinforcement learning strategy. *IEEE Trans Ind Inf* 2019;15:4969-77.

11. Bai W, Zhou Q, Li T, Li H. Adaptive reinforcement learning neural network control for uncertain nonlinear system with input saturation. *IEEE Trans Cybern* 2019;50:3433-43.

12. Yang X, Liu D, Wang D. Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints. *Int J Control* 2014;87:553-66.

13. Bai W, Li T, Tong S. NN reinforcement learning adaptive control for a class of nonstrict-feedback discrete-time systems. *IEEE Trans Cybern* 2020;50:4573-84.

14. Li Y, Zhang J, Liu W, Tong S. Observer-based adaptive optimized control for stochastic nonlinear systems with input and state constraints. *IEEE Trans Neural Netw Learn Syst* 2021;33:7791-805.

15. Wang J, Gong Q, Huang K, Liu Z, Chen CLP, Liu J. Event-triggered prescribed settling time consensus compensation control for a class of uncertain nonlinear systems with actuator failures. *IEEE Trans Neural Netw Learn Syst* 2023;34:5590-600.

16. Wang J, Wang C, Liu Z, Chen CLP, Zhang C. Practical fixed-time adaptive ERBFNNs event-triggered control for uncertain nonlinear systems with dead-zone constraint. *IEEE Trans Syst Man Cybern Syst* 2023:1-10.

17. Cai X, de Marcio M. Adaptive rigidity-based formation control for multirobotic vehicles with dynamics. *IEEE Trans Contr Syst Technol* 2014;23:389-96.

18. Ren H, Cheng Z, Qin J, Lu R. Deception attacks on event-triggered distributed consensus estimation for nonlinear systems. *Automatica* 2023;154:111100.

19. Wang J, Yan Y, Liu Z, Chen CLP, Zhang C, Chen K. Finite-time consensus control for multi-agent systems with full-state constraints and actuator failures. *Neural Netw* 2023;157:350-63.

20. Shang Y. Matrix-scaled consensus on weighted networks with state constraints. *IEEE Syst J* 2023;17:6472-9.

21. Cheng L, Hou ZG, Tan M, Lin Y, Zhang W. Neural-network-based adaptive leader-following control for multiagent systems with uncertainties. *IEEE Trans Neural Netw* 2010;21:1351-8.

22. Shen Q, Shi P, Zhu J, Wang S, Shi Y. Neural networks-based distributed adaptive control of nonlinear multiagent systems. *IEEE Trans Neural Netw Learn Syst* 2019;31:1010-21.

23. Zhang N, Xia J, Park JH, Zhang J, Shen H. Improved disturbance observer-based fixed-time adaptive neural network consensus tracking for nonlinear multi-agent systems. *Neural Netw* 2023;162:490-501.

24. Zhang Y, Sun J, Liang H, Li H. Event-triggered adaptive tracking control for multiagent systems with unknown disturbances. *IEEE Trans Cybern* 2018;50:890-901.

25. Chen J, Li J, Yuan X. Global fuzzy adaptive consensus control of unknown nonlinear multiagent systems. *IEEE Trans Fuzzy Syst* 2020;32:2239-50.

26. Zhang J, Liu S, Zhang X, Xia J. Event-triggered-based distributed consensus tracking for nonlinear multiagent systems with quantization. *IEEE Trans Neural Netw Learn Syst* 2022:1-11.

27. Deng C, Wen C, Wang W, Li X, Yue D. Distributed adaptive tracking control for high-order nonlinear multiagent systems over event-triggered communication. *IEEE Trans Automat Contr* 2022;68:1176-83.

28. Shao J, Shi L, Cheng Y, Li T. Asynchronous tracking control of leader--follower multiagent systems with input uncertainties over switching signed digraphs. *IEEE Trans Cybern* 2021;52:6379-90.

29. Yang Y, Xiao Y, Li T. Attacks on formation control for multiagent systems. *IEEE Trans Cybern* 2021;52:12805-17.

30. Ren H, Wang Y, Liu M, Li H. An optimal estimation framework of multi-agent systems with random transport protocol. *IEEE Trans Signal Process* 2022;70:2548-59.

31. Gao W, Jiang ZP, Lewis FL, Wang Y. Leader-to-formation stability of multiagent systems): An adaptive optimal control approach. *IEEE Trans Automat Contr* 2018;63:3581-87.

32. Tan M, Liu Z, Chen CLP, Zhang Y, Wu Z. Optimized adaptive consensus tracking control for uncertain nonlinear multiagent systems using a new event-triggered communication mechanism. *Inf Sci* 2022;605:301-16.

33. Wen G, Chen CLP. Optimized backstepping consensus control using reinforcement learning for a class of nonlinear strict-feedback-dynamic multi-agent systems. *IEEE Trans Neural Netw Learn Syst* 2023;34:1524-36.

34. Zhu HY, Li YX, Tong S. Dynamic event-triggered reinforcement learning control of stochastic nonlinear systems. *IEEE Trans Fuzzy Syst* 2023;31:2917-28.

35. Bai W, Li T, Long Y, Chen CLP. Event-triggered multigradient recursive reinforcement learning tracking control for multiagent systems. *IEEE Trans Neural Netw Learn Syst* 2023;34:366-79.

36. Li T, Bai W, Liu Q, Long Y, Chen CLP. Distributed fault-tolerant containment control protocols for the discrete-time multi-agent systems via reinforcement learning method. *IEEE Trans Neural Netw Learn Syst* 2023;34:3979-91.

37. Zhao Y, Niu B, Zong G, Zhao X, Alharbi KH. Neural network-based adaptive optimal containment control for non-affine nonlinear multi-agent systems within an identifier-actor-critic framework. *J Franklin Inst* 2023;360:8118-43.

38. Li H, Wu Y, Chen M, Lu R. Adaptive multigradient recursive reinforcement learning event-triggered tracking control for multiagent systems. *IEEE Trans Neural Netw Learn Syst* 2023;34:144-56.

39. Bechlioulis CP, Rovithakis GA. Robust adaptive control of feedback linearizable MIMO nonlinear systems with prescribed performance. *IEEE Trans Automat Contr* 2008;53:2090-9.

40. Wang X, Xia J, Park JH, Xie X, Chen G. Intelligent control of performance constrained switched nonlinear systems with random noises and its application): an event-driven approach. *IEEE Trans Circuits Syst I Regul Pap* 2022;69:3736-47.

41. Li Y, Shao X, Tong S. Adaptive fuzzy prescribed performance control of nontriangular structure nonlinear systems. *IEEE Trans Fuzzy Syst* 2019;28:2416-26.

42. Wang W, Liang H, Pan Y, Li T. Prescribed performance adaptive fuzzy containment control for nonlinear multiagent systems using disturbance observer. *IEEE Trans Cybern* 2020;50:3879-91.

43. Sun K, Qiu J, Karimi HR, Fu Y. Event-triggered robust fuzzy adaptive finite-time control of nonlinear systems with prescribed performance. *IEEE Trans Fuzzy Syst* 2020;29:1460-71.

44. Chen H, Yan H, Wang Y, Xie S, Zhang D. Reinforcement learning-based close formation control for underactuated surface vehicle with prescribed performance and time-varying state constraints. *Ocean Eng* 2022;256:111361.

## Cite This Article

## How to Cite

Yue, H.; Xia J. Reinforcement learning-based optimal adaptive fuzzy control for nonlinear multi-agent systems with prescribed performance. *Complex Eng. Syst.* **2023**, *3*, 19. http://dx.doi.org/10.20517/ces.2023.27

## Download Citation

## Export Citation File:

## Type of Import

### Tips on Downloading Citation

### Citation Manager File Format

### Type of Import

**Direct Import:**When the Direct Import option is selected (the default state), a dialogue box will give you the option to Save or Open the downloaded citation data. Choosing Open will either launch your citation manager or give you a choice of applications with which to use the metadata. The Save option saves the file locally for later use.

**Indirect Import:**When the Indirect Import option is selected, the metadata is displayed and may be copied and pasted as needed.

## About This Article

### Copyright

**Open Access**This article is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, sharing, adaptation, distribution and reproduction in any medium or format, for any purpose, even commercially, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## Data & Comments

### Data

### Comments

Comments must be written in English. Spam, offensive content, impersonation, and private information will not be permitted. If any comment is reported and identified as inappropriate content by OAE staff, the comment will be removed without notice. If you have any queries or need any help, please contact us at support@oaepublish.com.

^{0}