Reinforcement learning-based optimal adaptive fuzzy control for nonlinear multi-agent systems with prescribed performance

Huarong Yue; Jianwei Xia

doi:10.20517/ces.2023.27

Download PDF

Research Article | Open Access | 23 Nov 2023

Reinforcement learning-based optimal adaptive fuzzy control for nonlinear multi-agent systems with prescribed performance

Views: 935 | Downloads: 541 | Cited:

3

Huarong Yue

,

Jianwei Xia

Complex Eng Syst 2023;3:19.

10.20517/ces.2023.27 | © The Author(s) 2023.

Author Information

Article Notes

Cite This Article

Abstract

In this paper, the problem of optimal adaptive consensus tracking control for nonlinear multi-agent systems with prescribed performance is investigated. To address the issue of satisfying the initial value conditions in existing results, an improved performance function is employed as the prescribed performance boundary, effectively resolving this problem. Then, by employing the error transformation function, the constrained system is converted into an unconstrained one. Furthermore, fuzzy logic systems are employed to identify unknown system parts. By applying the dynamic surface technique, the problem of "differential explosion", which often occurs in backstepping, is solved. Moreover, a distributed optimal adaptive fuzzy control protocol based on the reinforcement learning actor-critic algorithm is proposed. Under the proposed control scheme, it is proved that all the signals within the closed-loop system are bounded, and the consensus tracking errors have remained within the predefined bounds. Finally, the numerical simulation results demonstrate the effectiveness of the proposed scheme.

Keywords

Multi-agent system, reinforcement learning, prescribed performance, optimal consensus control, actor-critic structure

Download PDF 0 8

1. INTRODUCTION

Optimal control is achieved by designing a control protocol that not only achieves the system control objectives but also minimizes the system cost. The field of optimal control has garnered significant scholarly interest in recent years. The optimal controller can be deduced from the solution of the Hamilton-Jacobi-Bellman (HJB) equation ^[1]. For linear systems, this is actually solving the Riccati equation. However, for nonlinear systems, the HJB equation is a partial differential equation containing multiple nonlinear terms, and it is conceivably challenging to solve the equation directly. One approach that can be implemented is dynamic programming (DP) ^[2–5]. However, this approach becomes less feasible for high-dimensional systems since it is a backward, offline computational process, which significantly increases the computational complexity in high-dimensional scenarios. As a form of machine learning, reinforcement learning (RL) arguably opens up another avenue to solve the problem ^[6–9]. The most commonly used RL algorithms make use of the actor-critic structure, in which the actor interacts with the environment, and the critic evaluates the actions of the actor and provides feedback; in this way, the actor performs the next task again. Subsequently, RL has been employed in various nonlinear systems for adaptive control, leading to remarkable outcomes^[10–14]. For example, an optimal adaptive controller for nonlinear systems with control gain functions was proposed to achieve not only tracking control but also the optimal performance of systems^[10]. In^[12], the problem of tracking control of nonlinear systems with input constraints was investigated. In^[14], an optimal observer-based adaptive control scheme for nonlinear stochastic systems with input and state constraints was proposed.

In the meanwhile, since most physical models in practical applications can be represented by nonlinear systems, the study of nonlinear systems is very important and has yielded rich results^[15–18]. In recent years, multi-agent systems (MASs) have garnered significant attention from scholars due to their capability to perform tasks that surpass the capabilities of a single agent. The consensus problem in MASs refers to achieving a state of agreement or convergence among multiple agents through the design of control protocols, which is a fundamental problem in the design and control of MASs. Over the past decades, the problem of consensus control for MASs has been extensively studied, leading to significant advancements^[19–30]. In^[23], a consensus control scheme incorporating a modified disturbance observer was designed to achieve fixed-time tracking control of nonlinear MASs with unknown disturbances. In^[27], an event-triggered control distributed scheme was proposed to address the problem of asymptotic tracking for nonlinear MASs with uncertain leaders. In recent times, there has been a surge of interest in incorporating RL into MASs. It is an interesting and challenging problem and has produced some excellent results^[31–38]. For instance, in^[33], an optimal backstepping consensus control protocol based on RL was introduced for nonlinear strict-feedback MASs, which not only exhibits algorithmic simplicity but also relaxes the need for two general conditions: known dynamics and persistence excitation. In^[34], an optimal RL-based event-triggered controller was proposed for nonlinear stochastic systems.

On the other hand, the concept of prescribed performance control (PPC) has emerged as a prominent research topic in the control community, initially proposed by Bechlioulis and Rovithakis^[39]. Transient and steady-state performance, which is often neglected by conventional control schemes that solely ensure closed-loop stability, is the primary concern of prescribed performance. The PPC strategy aims to align the actual system performance achieved after execution with the desired or prescribed performance criteria and has yielded remarkable outcomes^[40–43]. By utilizing exponential performance functions, both the nonlinear switched systems in^[40] and the non-triangularly structured systems in ^[42] were able to achieve the desired rate of convergence. To overcome the issue of "differential explosion" caused by repeated derivation, dynamic surface control schemes were proposed to implement the tracking control of systems in ^[41] and ^[43], respectively. Recently, there has been significant research focused on integrating RL into PPC^{[44, 45]}. However, it is noticed that all of the above results depend on the initial conditions; i.e., at the initial moment, the initial error needs to be made within a prescribed boundary by properly setting the initial values.

Motivated by the discussions above, this article focuses on the optimal adaptive consensus tracking control problem for leader-follower nonlinear MASs subject to prescribed performance constraints. The main contributions of the article are as follows.

(1) Based on the actor-critic structure of RL, the proposed consensus tracking control scheme can achieve optimal control of MASs while an excellent tracking effect is guaranteed. Compared with ^[10–12], the proposed algorithm is simpler to implement since it does not require system dynamic and persistent excitation conditions.

(2) In contrast to existing performance functions ^[40–43], most of which rely on initial value conditions, an improved performance function is introduced such that the proposed consensus tracking control scheme is able to force the convergence of the consensus tracking error to a prescribed region without the requirement of initial value conditions.

(3) Compared with the traditional backstepping control scheme ^{[24, 25]}, the dynamic surface technique is adopted, which effectively avoids the problem of "differential explosion" caused by multiple derivations of the virtual controller and makes the control structure simpler.

2. PRELIMINARIES AND PROBLEM FORMULATION

2.1. Topology theory

In this paper, information interactions between agents are inscribed by a directed graph $$ \mathscr{G}=(\mathscr{H}, \mathscr{T}, \mathscr{A}) $$ in which $$ \mathscr{H}=\{1, \cdots, N\} $$ and $$ \mathscr{T}\subseteq \mathscr{H}\times \mathscr{H} $$ denote the set of notes and the set of edges, respectively. Furthermore, $$ \mathscr{A}=[a_{ij}]\in \mathcal{R}^{N\times N} $$ denotes the adjacency matrix, all elements of which are non-negative, specifically $$ a_{ij}>0 $$ if $$ (j, i) \in \mathscr{T} $$ and $$ a_{ij}=0 $$ otherwise. If the agent $$ i $$ has access to the information of the agent $$ j $$, there is $$ (j, i) \in \mathscr{T} $$, and hence, the neighbors of the agent $$ i $$ can be described as $$ \mathscr{N}_i=\{j|(j, i)\in \mathscr{T}\} $$. The in-degree and Laplace matrix are defined as $$ \mathscr{D}=diag \{d_1, \cdots, d_N\} $$ with $$ d_i=\sum_{j=1}^Na_{ij} $$ and $$ \mathscr{L}=\mathscr{D}-\mathscr{A} $$. Similarly, if the agent $$ i $$ has access to the leader, there is $$ b_{i}=1 $$; otherwise, $$ b_{i}=0 $$, which forms the matrix $$ \mathscr{B}=diag\{b_1, \cdots, b_N\} $$.

2.2. System formulation

Consider the following nonlinear MAS composed of N agents, where the $$ i $$th agent can be modeled as:

(1)

$$ \begin{align} \left\{ \begin{aligned} \dot x_{i, j} &= x_{i, j+1}(t)+f_{i, j}(\bar{x}_{i, j}(t)), \\ \dot x_{i, n} &=u_{i}(t)+f_{i, n}(\bar{x}_{i, n}(t)), \\ \quad y_{i} &= x_{i, 1}, (j=1, 2, \ldots, n-1) \end{aligned} \right. \end{align} $$

where $$ i= 1, \ldots, N $$, $$ {{\bar x_{i, j}}}=[{x_{i, 1}}, {x_{i, 2}}, \cdots, {x_{i, j}}]^T\in{R^j}, j=1, \ldots, n $$ and $$ u_i $$ denote the state variable and control input, respectively. $$ f_{i, j} $$ represents the unknown nonlinear smooth function. $$ y_{i} = x_{i, 1} $$ denotes the output variable. c The consensus tracking error for agent $$ i $$ is defined as:

(2)

$$ e_{i}=\sum\limits_{j\in \mathscr{N}_i}a_{ij}(y_i-y_j)+b_i(y_i-y_r), $$

where $$ y_r $$ is generated by the output of the leader to represent the reference signal.

Assumption 1^[24] Let the leader be the root node and the directed graph have a spanning tree.

Assumption 2 The reference trajectory $$ y_r $$ and its derivatives $$ \dot y_r $$ are continuous and bounded.

Lemma 1^[25] If a function $$ F(x) $$ is continuous on a compact set $$ \Phi $$, then for any given accuracy $$ \varepsilon>0 $$, there exists a fuzzy logic system (FLS) such that

$$ \sup\limits_{x\in \Phi }|F(x)-{\vartheta}^T\phi(x)|\leq\varepsilon, $$

where $$ {\vartheta}=[\vartheta_1, \cdots, \vartheta_L]^T $$ denotes the ideal weight vector and $$ L>1 $$ indicates the number of fuzzy rules, and $$ \phi(x)=[\phi_1(x), \cdots, \phi_n(x)]^T/\sum_{r=1}^{L}\phi_r(x) $$ denotes the basis function vector with $$ \phi_r(x) $$ being a Gaussian function, i.e., for $$ r=1, \cdots, L, \phi_r(x)=\exp[-(x-\jmath_r)^T(x-\jmath_r)/\hbar^2_r] $$, where $$ \jmath_r=[\jmath_{r1}, \jmath_{r2}, \cdots, \jmath_{rn}]^T $$ is defined as the centre vector, and $$ \hbar_r $$ is defined as the width of the Gaussian function.

2.3. Error transformation

Define the following monotonically increasing function over the interval (-1, 1)

(3)

$$ \Gamma(x)=\frac{\sqrt{\chi}x}{\sqrt{1-x^{2}}}, $$

where $$ \chi $$ is a constant.

Define the monotonically increasing function $$ \omega_i(t) $$, which has the following properties:

(1) $$ \omega_i(0)=1 $$ and $$ \lim_{t \to \infty}\omega_i(t)=\dfrac{1}{l} $$, with $$ 0<l_i\leq1 $$ being a constant.

(2) $$ \omega_i $$ is differentiable up to order n and $$ \omega_i^{r}, r=1, \cdots, n $$ is bounded.

To achieve a desired level of system performance, we define the following performance function

(4)

$$ \Gamma(\varphi_i)=\frac{\sqrt{\chi}\varphi_i}{\sqrt{1-\varphi_i^{2}}}, $$

where $$ \varphi_i=\dfrac{1}{\omega_i(t)} $$. Then, our control objective is to make the consensus tracking error of the system satisfy $$ \Gamma(-\varphi_i(t))\leq e_i\leq \Gamma(\varphi_i(t)) $$.

Define the following normalized function

(5)

$$ \xi_i(e_i)=\dfrac{e_i}{\sqrt{\chi^2+e_i^2}}, $$

From the above equation, we know that there exists a constant $$ \bar \xi $$ satisfying $$ |\xi_i(e_i)|\leq \bar \xi <1 $$, in which case $$ e_i=\frac{\xi_i\sqrt{\chi}}{\sqrt{1-\xi_i}} $$ is bounded.

The following error transformation function is introduced

(6)

$$ \zeta_i(t)=\dfrac{\varpi_i(t)}{1-\varpi_i^2(t)}, $$

where $$ \varpi_i(t)=\omega_i(t)\xi_i(e_i) $$ with $$ \varpi_i(0)=\omega_i(0)\xi_i(e_i(0))=\xi_i(e_i(0)\in (-1, 1) $$. Therefore, it is clear from the above equation that we can deduce that for any initial value of $$ \varpi_i(0)\in (-1, 1) $$, $$ \zeta_i(t) \to \pm \infty $$ if and only if $$ \varpi_i\to -1 $$ or $$ \varpi_i\to 1 $$. This also implies that as long as $$ \zeta_i $$ is bounded, there exists a constant $$ \bar \varpi $$ satisfying $$ |\varpi_i|\leq \bar \varpi<1 $$.

From the above definition, it has

(7)

$$ -\varphi_i=-\dfrac{1}{\omega_i}<-\dfrac{\bar \varpi}{\omega_i}\leq \dfrac{\bar \varpi}{\omega_i}<\dfrac{1}{\omega_i}<\varphi_i. $$

Since $$ F(x) $$ is a monotonically increasing function, it follows that

(8)

$$ \Gamma(-\varphi_i) < \Gamma(\xi_i) < \Gamma(\varphi_i). $$

Noting that $$ \Gamma(\xi_i)=\dfrac{\sqrt{\chi}\xi_i}{\sqrt{1-\xi_i^2}} $$, we can get $$ \Gamma(-\varphi_i(t))\leq e_i\leq \Gamma(\varphi_i(t)) $$.

3. DESIGN PROCEDURE AND MAIN RESULTS

The controller design is conducted based on the following coordinate transformations

(9)

$$ z_{i, j}=x_{i, j}-\bar \alpha_{i, j}, $$

(10)

$$ \rho_{i, j}={\bar\alpha}_{i, j}-\hat{\alpha}_{i, j-1}, j=2, \dots, n, $$

where $$ \hat{\alpha}_{i, j-1} $$ denotes the approximate optimal virtual controller, $$ {\bar\alpha}_{i, j} $$ denotes the output of the first-order filter, and $$ \rho_{i, j} $$ is defined as the filtering error. The filtering dynamic is introduced as

(11)

$$ \kappa_{i, j-1} \dot{\bar\alpha}_{i, j}+\bar \alpha_{i, j}=\hat{\alpha}_{i, j-1}, \quad \bar \alpha_{i, j}(0)=\hat{\alpha}_{i, j-1}(0), $$

where $$ \kappa_{i, j-1} $$ is a positive constant.

To facilitate brevity, the following definitions are provided before the commencement of the design steps. For $$ i=1, \cdots, N, j=1, \cdots, n $$, $$ \tilde\theta_{a, ij}=\theta_{a, ij}-\hat \theta_{a, ij}$$, $$\tilde\theta_{c, ij}=\theta_{c, ij}-\hat \theta_{c, ij}$$, $$\tilde\vartheta_{i, j}=\vartheta_{i, j}-\hat \vartheta_{i, j}$$ where $$ \hat \theta_{a, ij} $$, $$ \hat \theta_{c, ij} $$, and $$ \hat \vartheta_{i, j} $$ are the estimations of $$ \theta_{a, ij}, \theta_{c, ij} $$, and $$ \vartheta_{i, j} $$, respectively. $$ \lambda^{\min}_{i, j} $$ denotes the minimum eigenvalue of $$ S_{i, j}S_{i, j}^T $$. The set $$ \varOmega $$ is the tight set that contains zero, and $$ \psi(\varOmega) $$ represents the admissible control. The adaptive laws of actor FLSs and critic FLSs, adaptive law $$ {\hat\vartheta}_{i, j} $$ are designed to be

(12)

$$ \dot {\hat\theta}_{a, ij}=-S_{i, j}S_{i, j}^T(\sigma_{a, ij}(\hat \theta_{a, ij}-\hat \theta_{c, ij})+\sigma_{c, ij}\hat \theta_{c, ij}) , $$

(13)

$$ \dot {\hat\theta}_{c, ij}=-\sigma_{c, ij}S_{i, j}S_{i, j}^T\hat \theta_{c, ij} , $$

(14)

$$ \dot {\hat\vartheta}_{i, 1}=\zeta_i\phi_{i, 1}-\sigma_{i, 1} {\hat\vartheta}_{i, 1}, $$

(15)

$$ \dot {\hat\vartheta}_{i, j}=z_{i, j}\phi_{i, j}-\sigma_{i, j} {\hat\vartheta}_{i, j}, j=2, \cdots, n $$

where $$ \sigma_{a, ij} $$, $$ \sigma_{c, ij} $$ and $$ \sigma_{i, j} $$ are design parameters.

Step 1: Derivation with respect to $$ e_i $$ gives

(16)

$$ \dot e_i=(d_i+b_i)(x_{i, 2}+f_{i, 1})-\sum\limits_{j\in \mathscr N_i}a_{i, j}\dot y_j-b_i\dot y_r. $$

Define the performance index function for the first subsystem of the agent $$ i $$ as

(17)

$$ V_{i, 1}(\zeta_i)=\int_{t}^{\infty}c_{i, 1}(\zeta_i(\tau), \alpha_{i, 1}(\zeta))d\tau, $$

where $$ c_{i, 1}(\zeta_i, \alpha_{i, 1})=\zeta_i^2+\alpha_{i, 1}^2 $$ represents the value function.

The optimal performance index function is expressed as

(18)

$$ \begin{align} V_{i, 1}^*(\zeta_i)&=\int_{t}^{\infty}c_{i, 1}(\zeta_i(\tau), \alpha^*_{i, 1}(\zeta))d\tau \\ &=\min\limits_{\alpha^*_{i, 1}\in \psi(\varOmega)}(\int_{t}^{\infty}c_{i, 1}(\zeta_i(\tau), \alpha_{i, 1} (\zeta))d\tau). \end{align} $$

Considering $$ x_{i, 2} $$ as the optimal virtual controller $$ \alpha^*_{i, 1} $$, then the HJB equation is given as

(19)

$$ \begin{align} H_{i, 1}(\zeta_i, \alpha^*_{i, 1}, V_{i, 1}^*)&=c_{i, 1}(\zeta_i, \alpha^*_{i, 1})+\frac{\partial V_{i, 1}^*}{\partial \zeta_i} \dot \zeta_i\\ &=\zeta^2_{i}+\alpha_{i, 1}^{*2}+\frac{\partial V_{i, 1}^*}{\partial \zeta_i}[\eta_i((d_i+b_i)(\alpha^*_{i, 1}+f_{i, 1})-\sum\limits_{j\in \mathscr N_i}a_{i, j}\dot y_j-b_i \dot y_r)+v_i]=0, \end{align} $$

where $$ \eta_i=\dfrac{\chi}{\sqrt{(e_i+\chi)}(e_i+\chi)} $$, $$ v_i=\dfrac{1+\varpi_i^2}{(1-\varpi_i^2)^2} $$.

Define the Bellman residual as

(20)

$$ \begin{align} \Theta&=H_{i, 1}(\zeta_i, \hat\alpha^*_{i, 1}, \hat V^*_{i, 1})-H_{i, 1}(\zeta_i, \alpha^*_{i, 1}, V_{i, 1}^*)\\ &=H_{i, 1}(\zeta_i, \hat\alpha^*_{i, 1}, \hat V^*_{i, 1}). \end{align} $$

The approximate optimal virtual controller is expected to guarantee that $$ \Theta $$ tends to zero. The following positive function is given by

(21)

$$ I_{i, 1}=({\hat\theta}_{a, i1}- {\hat\theta}_{c, i1})^T({\hat\theta}_{a, i1}- {\hat\theta}_{c, i1}). $$

Since equation $$ H_{i, 1}(\zeta_i, \alpha^*_{i, 1}, V_{i, 1}^*)=0 $$ has a unique solution, it is easy to deduce that $$ I_{i, 1}=0 $$ is equivalent to $$ \frac{\partial H_{i, 1}(\zeta_i, \alpha^*_{i, 1}, V_{i, 1}^*)}{\partial \hat\theta_{a, i1}}=0 $$. Then, from equations (12) and (13), we get

(22)

$$ \begin{align} \dot I_{i, 1}&=\dfrac{\partial I_{i, 1}}{\partial {\hat\theta}_{a, i1}}\dot{\hat\theta}_{a, i1}+\dfrac{\partial I_{i, 1}}{\partial {\hat\theta}_{c, i1}}\dot{\hat\theta}_{c, i1}\\ &=-\frac{\sigma_{a, i1}}{2}\dfrac{\partial I_{i, 1}}{\partial {\hat\theta}_{a, i1}} S_{i, 1}S_{i, 1}^T \dfrac{\partial I_{i, 1}}{\partial {\hat\theta}_{c, i1}}\leq0. \end{align} $$

Thus, the designed adaptive laws $$ \dot {\hat\theta}_{a, i1} $$ and $$ \dot {\hat\theta}_{c, i1} $$ enable $$ H_{i, 1}(\zeta_i, \alpha^*_{i, 1}, V_{i, 1}^*)=0 $$ to be satisfied.

By calculating $$ \frac{\partial H_{i, 1}}{\partial\alpha^*_{i, 1}}=0 $$, it yields

(23)

$$ \alpha^*_{i, 1}=-\dfrac{\eta_i(d_i+b_i)}{2} \frac{\partial V_{i, 1}^*}{\partial \zeta_i} . $$

To obtain the optimal virtual controller, we decompose $$ \frac{\partial V_{i, 1}^*}{\partial \zeta_i} $$ to derive the following equation

(24)

$$ \frac{\partial V_{i, 1}^*}{\partial \zeta_i} = \dfrac{2c_{i, 1}\zeta_i}{(d_i+b_i)^2\eta_i^2}+\dfrac{1}{(d_i+b_i)^2\eta_i^2}V_{i, 1}^o+\dfrac{2}{(d_i+b_i)^2\eta_i^2}\vartheta^{*T}_{i, 1}\phi_{i, 1}, $$

(25)

$$ V_{i, 1}^o=(d_i+b_i)^2\eta_i^2\frac{\partial V_{i, 1}^*}{\partial \zeta_i}-2c_{i, 1}\zeta_i-2\vartheta^{*T}_{i, 1}\phi_{i, 1}. $$

where $$ c_{i, 1} $$ is a design parameter.

Substituting equation (24) into equations (23), it follows that

(26)

$$ \alpha^*_{i, 1}=-\dfrac{c_{i, 1}}{(d_i+b_i)\eta_i}\zeta_i-\dfrac{1}{2(d_i+b_i)\eta_i}V_{i, 1}^o-\dfrac{1}{(d_i+b_i)\eta_i}\vartheta^{*T}_{i, 1}\phi_{i, 1}. $$

Since $$ V_{i, 1}^o $$ is an unknown term, by applying Lemma 2, there exists an FLS such that

(27)

$$ V_{i, 1}^o=\theta^{*T}_{i, 1}S_{i, 1}+\varepsilon_{i, 1}, $$

where $$ \theta^*_{i, 1} $$ refers to the optimal weights, $$ S_{i, 1} $$ refers to the basis function, and $$ \varepsilon_{i, 1} $$ is the approximation error.

Substituting equation (27) into equations (24) and (26) results in

(28)

$$ \alpha^*_{i, 1}=-\dfrac{c_{i, 1}}{(d_i+b_i)\eta_i}\zeta_i-\dfrac{1}{2(d_i+b_i)\eta_i}(\theta^{*T}_{i, 1}S_{i, 1}+\varepsilon_{i, 1})-\dfrac{1}{(d_i+b_i)\eta_i}\vartheta^{*T}_{i, 1}\phi_{i, 1}, $$

(29)

$$ \frac{\partial V_{i, 1}^*}{\partial \zeta_i} = \dfrac{2c_{i, 1}\zeta_i}{(d_i+b_i)^2\eta_i^2}+\dfrac{1}{(d_i+b_i)^2\eta_i^2}(\theta^{*T}_{i, 1}S_{i, 1}+\varepsilon_{i, 1})+\dfrac{2}{(d_i+b_i)\eta_i}\vartheta^{*T}_{i, 1}\phi_{i, 1}. $$

The $$ \hat \theta_{a, i1} $$ and $$ \hat \theta_{c, i1} $$ of the actor FLS and the critic FLS are used to estimate the unknown weights $$ \theta^*_{i, 1} $$, respectively, to obtain

(30)

$$ \hat \alpha_{i, 1}=-\dfrac{c_{i, 1}}{(d_i+b_i)\eta_i}\zeta_i-\dfrac{1}{2(d_i+b_i)\eta_i}\hat \theta^T_{a, i1}S_{i, 1}-\dfrac{1}{(d_i+b_i)\eta_i}\hat \vartheta^T_{i, 1}\phi_{i, 1}, $$

(31)

$$ \frac{\partial \hat V_{i, 1}}{\partial \zeta_i} = \dfrac{2c_{i, 1}\zeta_i}{(d_i+b_i)^2\eta_i^2}+\dfrac{1}{(d_i+b_i)^2\eta_i^2}\hat \theta^T_{c, i1}S_{i, 1}+\dfrac{2}{(d_i+b_i)\eta_i}\hat \vartheta^T_{i, 1}\phi_{i, 1}. $$

Construct the following Lyapunov candidate function

(32)

$$ V_{i, 1}=\dfrac{1}{2}\zeta^2_i+\dfrac{1}{2}\tilde\theta_{a, i1}^T\tilde\theta_{a, i1}+\dfrac{1}{2}\tilde\theta_{c, i1}^T\tilde\theta_{c, i1}+ \dfrac{1}{2}\tilde\vartheta^T_{i, 1}\tilde\vartheta_{i, 1}. $$

The derivation of $$ V_{i, 1} $$ yields

(33)

$$ \begin{align} \dot V_{i, 1}&= \zeta_i[\eta_i((d_i+b_i)(\rho_{i, 2}+\hat \alpha_{i, 1}+z_{i, 2}+f_{i, 1})-\sum\limits_{j\in\mathscr N_i}a_{i, j} \dot y_j-b_i \dot y_r)+v_i]\\ &+\tilde\theta_{a, i1}S_{i, 1}^TS_{i, 1}^T(\sigma_{a, i1}(\hat \theta_{a, i1}-\hat \theta_{c, i1})+\sigma_{c, i1}\hat \theta_{c, i1}) +\tilde\theta_{c, i1}^T\sigma_{c, i1}S_{i, 1}S_{i, 1}^T\hat \theta_{c, i1}-\tilde\vartheta^T_{i, 1}\dot{\hat\vartheta}_{i, 1}. \end{align} $$

Define $$ F_{i, 1} $$ to be the

(34)

$$ F_{i, 1}=\eta_i(d_i+b_i)f_{i, 1}-\eta_i\sum\limits_{j\in \mathscr N_i}a_{i, j}\dot y_j-\eta_ib_i \dot y_r-v_i. $$

By Lemma 2, there exists an FLS approximation to $$ F_{i, 1} $$ that results in

(35)

$$ \begin{align} F_{i, 1}=\vartheta_{i, 1}^{*T}\phi_{i, 1}+\epsilon_{i, 1}, \end{align} $$

where $$ \vartheta_{i, 1}^* $$ is the ideal weight, $$ \phi_{i, 1} $$ is the basis function, and $$ \epsilon_{i, 1} $$ is the approximation error satisfying $$ \epsilon_{i, 1}\leq \bar\epsilon_{i, 1} $$ with $$ \bar\epsilon_{i, 1}>0 $$.

With the help of Young's inequality, one has

(36)

$$ \zeta_{i, 1}\epsilon_{i, 1} \leq \frac{1}{2}\zeta_{i}^2+\frac{1}{2}\bar\epsilon_{i, 1}^2 , $$

(37)

$$ \zeta_i\eta_{i}(d_i+b_i)z_{i, 2} \leq \frac{1}{2}\zeta^2_{i}+\frac{1}{2}\eta_{i}^2(d_i+b_i)^2z_{i, 2}^2, $$

(38)

$$ \zeta_i\eta_{i}(d_i+b_i)\rho_{i, 2} \leq \frac{1}{2}\zeta^2_{i}+\frac{1}{2}\eta_{i}^2(d_i+b_i)^2\rho_{i, 2}^2 , $$

Substituting equations (30), (36), (37), and (38) into equation (33) yields

(39)

$$ \begin{align} \dot V_{i, 1}&\leq -(c_{i, 1}-\frac{3}{2})\zeta_{i}^2+\frac{1}{2}\eta_{i}^2(d_i+b_i)^2z_{i, 2}^2+\frac{1}{2}\eta_{i}^2(d_i+b_i)^2\rho_{i, 2}^2+\frac{1}{2}\bar\epsilon_{i, 1}^2-\frac{1}{2}\zeta_i\hat \theta_{a, i1}S_{i, 1}\\ &+\sigma_{i, 1}\tilde\vartheta_{i, 1}{\hat\vartheta}_{i, 1}+\tilde\theta_{a, i1}S_{i, 1}^TS_{i, 1}^T(\sigma_{a, i1}(\hat \theta_{a, i1}-\hat \theta_{c, i1})+\sigma_{c, i1}\hat \theta_{c, i1}) +\tilde\theta_{c, i1}^T\sigma_{c, i1}S_{i, 1}S_{i, 1}^T\hat \theta_{c, i1}. \end{align} $$

According to Young's inequality, it can be derived that

(40)

$$-\frac{1}{2}\zeta_i\hat \theta^T_{a, i1}S_{i, 1}\leq \frac{1}{4}\zeta_{i}^2+\frac{1}{4}\hat\theta_{a, i1}^TS_{i, 1}S_{i, 1}^T\hat \theta_{a, i1}, $$

(41)

$$ \begin{align} &\sigma_{a, i1}\tilde\theta_{a, i1}^TS_{i, 1}S_{i, 1}^T\hat \theta_{a, i1}\\ =&-\frac{\sigma_{a, i1}}{2}\tilde\theta_{a, i1}^TS_{i, 1}S_{i, 1}^T\tilde \theta_{a, i1} -\frac{\sigma_{a, i1}}{2}\hat\theta_{a, i1}^TS_{i, 1}S_{i, 1}^T\hat \theta_{a, i1} +\frac{\sigma_{a, i1}}{2}\theta_{a, i1}^TS_{i, 1}S_{i, 1}^T \theta_{a, i1}, \end{align} $$

(42)

$$ \begin{align} &\sigma_{c, i1}\tilde\theta_{c, i1}^TS_{i, 1}S_{i, 1}^T\hat \theta_{c, i1}\\ =&-\frac{\sigma_{c, i1}}{2}\tilde\theta_{c, i1}^TS_{i, 1}S_{i, 1}^T\tilde \theta_{c, i1} -\frac{\sigma_{c, i1}}{2}\hat\theta_{c, i1}^TS_{i, 1}S_{i, 1}^T\hat \theta_{c, i1} +\frac{\sigma_{c, i1}}{2}\theta_{c, i1}^TS_{i, 1}S_{i, 1}^T \theta_{c, i1}, \end{align} $$

(43)

$$ \begin{align} &(\sigma_{c, i1}-\sigma_{a, i1})\tilde\theta_{a, i1}^TS_{i, 1}S_{i, 1}^T \hat \theta_{c, i1}\\ \leq&\frac{(\sigma_{c, i1}-\sigma_{a, i1})}{2}\tilde\theta_{a, i1}^TS_{i, 1}S_{i, 1}^T \tilde \theta_{a, i1} +\frac{(\sigma_{c, i1}-\sigma_{a, i1})}{2}\hat\theta_{c, i1}^TS_{i, 1}S_{i, 1}^T\hat \theta_{c, i1}, \end{align} $$

(44)

$$ \sigma_{i, 1}\tilde\vartheta_{i, 1}{\hat\vartheta}_{i, 1} \leq \frac{\sigma_{i, 1}}{2}\vartheta_{i, 1}^2-\frac{\sigma_{i, 1}}{2}\tilde\vartheta_{i, 1}^2. $$

By means of equations (40)-(44), $$ \dot V_{i, 1} $$ becomes

(45)

$$ \begin{align} \dot V_{i, 1}&\leq -(c_{i, 1}-\frac{7}{4})\zeta_{i}^2+\frac{1}{2}\eta_{i}^2(d_i+b_i)^2z_{i, 2}^2+\frac{1}{2}\eta_{i}^2(d_i+b_i)^2\rho_{i, 2}^2\\ &\frac{1}{2}\bar\epsilon_{i, 1}^2+\frac{\sigma_{i, 1}}{2}\vartheta_{i, 1}^2-\frac{\sigma_{i, 1}}{2}\hat\vartheta_{i, 1}^2+\frac{\sigma_{a, i1}}{2}\theta_{a, i1}^TS_{i, 1}S_{i, 1}^T \theta_{a, i1}+\frac{\sigma_{c, i1}}{2}\theta_{c, i1}^TS_{i, 1}S_{i, 1}^T \theta_{c, i1}\\ &-(\frac{\sigma_{a, i1}}{2}-\frac{1}{4})\hat\theta_{a, i1}^TS_{i, 1}S_{i, 1}^T\hat \theta_{a, i1} -\frac{\sigma_{c, i1}}{2}\hat\theta_{c, i1}^TS_{i, 1}S_{i, 1}^T\hat \theta_{c, i1} -(\sigma_{a, i1}-\frac{\sigma_{c, i1}}{2})\tilde\theta_{a, i1}^TS_{i, 1}S_{i, 1}^T\tilde \theta_{a, i1}\\ &-\frac{\sigma_{c, i1}}{2}\tilde\theta_{c, i1}^TS_{i, 1}S_{i, 1}^T\tilde \theta_{c, i1}. \end{align} $$

Then, it can be further bounded as

(46)

$$ \begin{align} \dot V_{i, 1}&\leq -(c_{i, 1}-\frac{7}{4})\zeta_{i}^2+\frac{1}{2}\eta_{i}^2(d_i+b_i)^2z_{i, 2}^2+\frac{1}{2}\eta_{i}^2(d_i+b_i)^2\rho_{i, 2}^2+\Upsilon_{i, 1}\\ &-\frac{\sigma_{i, 1}}{2}\tilde\vartheta_{i, 1}^2 -(\sigma_{a, i1}-\frac{\sigma_{c, i1}}{2})\lambda^{\min}_{i, 1}\tilde\theta_{a, i1}^T\tilde \theta_{a, i1} -\frac{\sigma_{c, i1}}{2}\lambda^{\min}_{i, 1}\tilde\theta_{c, i1}^T \tilde \theta_{c, i1}, \end{align} $$

where $$ \Upsilon_{i, 1}=\frac{1}{2}\bar\epsilon_{i, 1}^2+\frac{\sigma_{i, 1}}{2}\vartheta_{i, 1}^2 +\frac{\sigma_{a, i1}}{2}\theta_{a, i1}^TS_{i, 1}S_{i, 1}^T \theta_{a, i1}+\frac{\sigma_{c, i1}}{2}\theta_{c, i1}^TS_{i, 1}S_{i, 1}^T \theta_{c, i1}-$$$$(\frac{\sigma_{a, i1}}{2}-\frac{1}{4})\hat\theta_{a, i1}^TS_{i, 1}S_{i, 1}^T\hat \theta_{a, i1} -\frac{\sigma_{c, i1}}{2}\hat\theta_{c, i1}^TS_{i, 1}S_{i, 1}^T\hat \theta_{c, i1} $$.

Step j$$ (2\leq j \leq n-1) $$: From equations (9) and (10), it holds that

(47)

$$ \dot z_{i, j}=z_{i, j+1}+\rho_{i, j+1}+\alpha_{i, j}+f_{i, j}-\dot{\bar\alpha}_{i, j+1}. $$

The performance index function is chosen to be

(48)

$$ V_{i, j}(z_{i, j})=\int_{t}^{\infty}c_{i, j}(z_{i, j}(\tau), \alpha_{i, j}(z))d\tau, $$

where $$ c_{i, j}(z_i, \alpha_{i, j})=z_{i, j}^2+\alpha_{i, j}^2 $$ represents the value function.

Considering $$ x_{i, j+1} $$ as the optimal virtual controller $$ \alpha^*_{i, j} $$, then the optimal performance index function has the form

(49)

$$ \begin{align} V_{i, j}^*(z_{i, j})&=\int_{t}^{\infty}c_{i, j}(z_{i, j}(\tau), \alpha^*_{i, j}(z))d\tau \\ &=\min\limits_{\alpha^*_{i, j}\in \psi(\varOmega)}(\int_{t}^{\infty}c_{i, j}(z_{i, j}(\tau), \alpha_{i, j} (z))d\tau). \end{align} $$

The HJB equation can be written as

(50)

$$ \begin{align} H_{i, j}(z_{i, j}, \alpha^*_{i, j}, V_{i, j}^*)&=c_{i, j}(z_{i, j}, \alpha_{i, j})+\frac{\partial V_{i, j}^*}{\partial z_{i, j}} \dot z_{i, j}\\ &=z^2_{i, j}+\alpha_{i, j}^{*2}+\frac{\partial V_{i, j}^*}{\partial z_{i, j}}(\alpha_{i, j}^*+f_{i, j}-\dot{\bar\alpha}_{i, j+1})=0. \end{align} $$

By calculating $$ \frac{\partial H_{i, j}}{\partial\alpha^*_{i, j}}=0 $$, it yields

(51)

$$ \alpha^*_{i, j}=-\dfrac{1}{2} \frac{\partial V_{i, j}^*}{\partial z_{i, j}}. $$

Then, decompose $$ \frac{\partial V_{i, j}^*}{\partial z_{i, j}} $$ into the following two parts

(52)

$$ \frac{\partial V_{i, j}^*}{\partial z_{i, j}}={2c_{i, j}}z_{i, j}+V_{i, j}^o+2\vartheta^{*T}_{i, j}\phi_{i, j}, $$

(53)

$$ V_{i, j}^o=\frac{\partial V_{i, j}^*}{\partial z_{i, j}}-2c_{i, j}z_{i, j}-2\vartheta^{*T}_{i, j}\phi_{i, j}, $$

where $$ c_{i, j} $$ is a design parameter.

From equation (52), it is easy to introduce

(54)

$$ \alpha^*_{i, j}=-{c_{i, j}}z_{i, j}-\dfrac{1}{2}V_{i, 1}^o-\vartheta^{*T}_{i, j}\phi_{i, j}. $$

Since $$ V_{i, j}^o $$ is an unknown term, by applying Lemma 2, there exists an FLS such that

(55)

$$ V_{i, j}^o =\theta^{*T}_{i, j}S_{i, j}+\varepsilon_{i, j}, $$

where $$ \theta^*_{i, j} $$ refers to the optimal weights, $$ S_{i, j} $$ refers to the basis function, and $$ \varepsilon_{i, j} $$ is the approximation error.

According to the above equation, we can get

(56)

$$ \alpha^*_{i, j}=-{c_{i, j}}z_{i, j}-\dfrac{1}{2}(\theta^{*T}_{i, j}S_{i, j}+\varepsilon_{i, j})-\vartheta^{*T}_{i, j}\phi_{i, j}, $$

(57)

$$ \frac{\partial V_{i, j}^*}{\partial z_{i, j}}= 2c_{i, j}z_{i, j}+(\theta^{*T}_{i, j}S_{i, j}+\varepsilon_{i, 1})+\vartheta^{*T}_{i, j}\phi_{i, j}. $$

The $$ \hat \theta_{a, ij} $$ and $$ \hat \theta_{c, ij} $$ of the actor FLS and the critic FLS are used to estimate the unknown weights $$ \theta^*_{i, j} $$, respectively, to obtain

(58)

$$ \hat \alpha_{i, j}=-{c_{i, j}}z_{i, j}-\dfrac{1}{2}\hat \theta_{a, ij}S_{i, j}-\hat \vartheta_{i, j}^{*T}\phi_{i, j}, $$

(59)

$$ \frac{\partial \hat V_{i, j}}{\partial z_{i, j}}= {2c_{i, j}z_{i, j}}+\hat \theta_{c, ij}S_{i, j}+2\hat \vartheta^{*T}_{i, j}\phi_{i, j}. $$

The candidate Lyapunov function function is chosen as

(60)

$$ V_{i, j}=V_{i, j-1}+\dfrac{1}{2}z^2_{i, j}+\dfrac{1}{2}\tilde\theta_{a, ij}^T\tilde\theta_{a, ij}+\dfrac{1}{2}\tilde\theta_{c, ij}^T\tilde\theta_{c, ij}+ \dfrac{1}{2}\tilde\vartheta^T_{i, j}\tilde\vartheta_{i, j}+\dfrac{1}{2}\rho_{i, j}\rho_{i, j}. $$

Derivation of $$ V_{i, j} $$ yields

(61)

$$ \begin{align} \dot V_{i, j}&=\dot V_{i, j-1}+ z_{i, j}(z_{i, j+1}+\rho_{i, j+1}+\hat\alpha_{i, j}+f_{i, j}-\dot{\bar\alpha}_{i, j+1})+\rho_{i, j}\dot \rho_{i, j}\\ &+\tilde\theta_{a, ij}S_{i, j}^TS_{i, j}^T(\sigma_{a, ij}(\hat \theta_{a, ij}-\hat \theta_{c, ij})+\sigma_{c, ij}\hat \theta_{c, ij}) +\tilde\theta_{c, ij}^T\sigma_{c, ij}S_{i, j}S_{i, j}^T\hat \theta_{c, ij}-\tilde\vartheta^T_{i, j}\dot{\hat\vartheta}_{i, j}. \end{align} $$

From equations (10) and (11), it holds that

(62)

$$ \dot \rho_{i, j}=-\dfrac{\rho_{i, j}}{\kappa_i}-\dot{\hat\alpha}_{i, j-1}. $$

Define $$ M_i=-\dot{\hat\alpha}_{i, j-1} $$, which is a bounded continuous function.

By Lemma 2, there exists an FLS satisfying

(63)

$$ F_{i, j}=\vartheta_{i, j}^{*T}\phi_{i, j}+\epsilon_{i, j}, $$

where $$ F_{i, j}=f_{i, j}-\dot {\bar \alpha}_{i, j} $$, $$ \vartheta_{i, j}^* $$ is the ideal weight, $$ \phi_{i, j} $$ is the basis function, and $$ \epsilon_{i, j} $$ is the approximation error satisfying $$ \epsilon_{i, j}\leq \bar\epsilon_{i, j} $$ with $$ \bar\epsilon_{i, j}>0 $$.

With the help of Young's inequality, the following inequality holds

(64)

$$ z_{i, j}\epsilon_{i, j} \leq \frac{1}{2}z_{i, j}^2+\frac{1}{2}\bar\epsilon_{i, j}^2, $$

(65)

$$ z_{i, j}z_{i, j+1} \leq \frac{1}{2}z^2_{i, j}+\frac{1}{2}z_{i, j+1}^2, $$

(66)

$$ z_{i, j}\rho_{i, j+1} \leq \frac{1}{2}z^2_{i, j}+\frac{1}{2}\rho_{i, j+1}^2, $$

(67)

$$ \rho_{i, j}M_i \leq \frac{1}{2}\rho^2_{i, j}+\frac{1}{2}M_i^2. $$

Similar to the first step, it has

(68)

$$ -\frac{1}{2}z_{i, j}\hat \theta^T_{a, ij}S_{i, j}\leq \frac{1}{4}z_{i, j}^2+\frac{1}{4}\hat\theta_{a, ij}^TS_{i, 1}S_{i, j}^T\hat \theta_{a, ij}, $$

(69)

$$ \begin{align} &\sigma_{a, ij}\tilde\theta_{a, ij}^TS_{i, j}S_{i, j}^T\hat \theta_{a, ij}\\ =&-\frac{\sigma_{a, ij}}{2}\tilde\theta_{a, ij}^TS_{i, j}S_{i, j}^T\tilde \theta_{a, ij} -\frac{\sigma_{a, ij}}{2}\hat\theta_{a, ij}^TS_{i, j}S_{i, j}^T\hat \theta_{a, ij} +\frac{\sigma_{a, ij}}{2}\theta_{a, ij}^TS_{i, j}S_{i, j}^T \theta_{a, ij}, \end{align} $$

(70)

$$ \begin{align} &\sigma_{c, ij}\tilde\theta_{c, ij}^TS_{i, j}S_{i, j}^T\hat \theta_{c, ij}\\ =&-\frac{\sigma_{c, ij}}{2}\tilde\theta_{c, ij}^TS_{i, j}S_{i, j}^T\tilde \theta_{c, ij} -\frac{\sigma_{c, ij}}{2}\hat\theta_{c, ij}^TS_{i, j}S_{i, j}^T\hat \theta_{c, ij} +\frac{\sigma_{c, ij}}{2}\theta_{c, ij}^TS_{i, j}S_{i, j}^T \theta_{c, ij}, \end{align} $$

(71)

$$ \begin{align} &(\sigma_{c, ij}-\sigma_{a, ij})\tilde\theta_{a, ij}^TS_{i, j}S_{i, j}^T \hat \theta_{c, ij}\\ \leq&\frac{(\sigma_{c, ij}-\sigma_{a, ij})}{2}\tilde\theta_{a, ij}^TS_{i, j}S_{i, j}^T \tilde \theta_{a, ij} +\frac{(\sigma_{c, ij}-\sigma_{a, ij})}{2}\hat\theta_{c, ij}^TS_{i, j}S_{i, j}^T\hat \theta_{c, ij}, \end{align} $$

(72)

$$ \sigma_{i, j}\tilde\vartheta_{i, j}{\hat\vartheta}_{i, j} \leq \frac{\sigma_{i, j}}{2}\vartheta_{i, j}^2-\frac{\sigma_{i, j}}{2}\tilde\vartheta_{i, j}^2. $$

From equations (58), (62), and (64)-(72), $$ \dot V_{i, j} $$ can be directly bounded as

(73)

$$ \begin{align} \dot V_{i, j}&\leq -(c_{i, 1}-\frac{7}{4})\zeta_{i}^2-(c_{i, 2}-\frac{7}{4}-\frac{(d_i+b_i)^2\eta_i^2}{2})z_{i, 2}^2 -\sum\limits_{m=3}^{j}(c_{i, m}-\frac{9}{4})z_{i, m}^2+\frac{1}{2}z_{i, j+1}^2+\frac{1}{2}\varkappa_i\rho_{i, j+1}^2\\ &-(\dfrac{1}{\kappa_i}-\frac{1}{2}-\frac{(d_i+b_i)^2\eta_i^2}{2})\rho_{i, 2}^2-\sum\limits_{m=3}^{j}(\dfrac{1}{\kappa_i}-1)\rho_{i, m}^2-\sum\limits_{m=1}^{j}\frac{\sigma_{i, m}}{2}\tilde\vartheta_{i, m}^2 -\sum\limits_{m=1}^{j}\lambda^{\min}_{i, m}(\sigma_{a, im}-\frac{\sigma_{c, im}}{2})\tilde\theta_{a, im}^T\tilde \theta_{a, im}\\ &-\sum\limits_{m=1}^{j}\lambda^{\min}_{i, m}\frac{\sigma_{c, im}}{2}\tilde\theta_{c, im}^T\tilde \theta_{c, im}+\sum\limits_{m=1}^{j}\Upsilon_{i, m}, \end{align} $$

where $$ \Upsilon_{i, j}=\frac{1}{2}\bar\epsilon_{i, j}^2+\frac{\sigma_{i, j}}{2}\vartheta_{i, j}^2 +\frac{\sigma_{a, ij}}{2}\theta_{a, ij}^TS_{i, j}S_{i, j}^T \theta_{a, ij}+\frac{\sigma_{c, ij}}{2}\theta_{c, ij}^TS_{i, j}S_{i, j}^T \theta_{c, ij}-(\frac{\sigma_{a, ij}}{2}-\frac{1}{4})\hat\theta_{a, ij}^TS_{i, j}S_{i, j}^T\hat \theta_{a, ij} $$$$-\frac{\sigma_{c, ij}}{2}\hat\theta_{c, ij}^TS_{i, j}S_{i, j}^T\hat \theta_{c, ij}+\frac{1}{2}M_{i}^2 $$ and $$ \varkappa_i= \begin{cases} \frac{(d_i+b_i)^2\eta_i^2}{2}, & j=1, \\ 1, & j=2, \cdots, n-1. \end{cases} $$Step n: The performance index function for the last subsystem is defined as

(74)

$$ V_{i, n}(z_{i, n})=\int_{t}^{\infty}c_{i, n}(z_{i, n}(\tau), u^*_{i}(z))d\tau, $$

where $$ c_{i, n}(z_n, \alpha_{i, n})=z_{i, n}^2+u_{i}^{*2} $$ represents the value function.

The optimal performance index function is

(75)

$$ \begin{align} V_{i, j}^*(z_{i, n})&=\int_{t}^{\infty}c_{i, n}(z_{i, n}(\tau), u^*_{i}(z))d\tau \\ &=\min\limits_{u^*_{i}\in \psi(\varOmega)}(\int_{t}^{\infty}c_{i, n}(z_{i, n}(\tau), u_{i} (z))d\tau). \end{align} $$

The HJB equation is introduced as

(76)

$$ \begin{align} H_{i, n}(z_{i, n}, u^*_{i}, V_{i, n}^*)&=c_{i, n}(z_{i, n}, u^*_{i})+\frac{\partial V_{i, n}^*}{\partial z_{i, n}} \dot z_{i, n}\\ &=z^2_{i, n}+u_{i}^{*2}+\frac{\partial V_{i, n}^*}{\partial z_{i, n}}(u^*_i+f_{i, n}-\dot{\bar\alpha}_{i, n})=0. \end{align} $$

By calculating $$ \frac{\partial H_{i, n}}{\partial u^*_{i}}=0 $$, we get

(77)

$$ u^*_{i}=-\dfrac{1}{2} \frac{\partial V_{i, n}^*}{\partial z_{i, n}}. $$

$$ \frac{\partial V_{i, n}^*}{\partial z_{i, n}} $$ can be decomposed into

(78)

$$ \frac{\partial V_{i, n}^*}{\partial z_{i, n}}={2c_{i, n}}z_{i, n}+V_{i, n}^o+2\vartheta^{*T}_{i, n}\phi_{i, n}, $$

(79)

$$ V_{i, n}^o=\frac{\partial V_{i, n}^*}{\partial z_{i, n}}-2c_{i, n}z_{i, n}-2\vartheta^{*T}_{i, n}\phi_{i, n}. $$

where $$ c_{i, n} $$ is a design parameter. Since $$ V_{i, n}^o $$ is an unknown term, by applying Lemma 2, one gets

(80)

$$ V_{i, n}^o=\theta^{*T}_{i, n}S_{i, n}+\varepsilon_{i, n}. $$

Then, the facts below are easily available

(81)

$$ u^*_{i}=-{c_{i, n}}z_{i, n}-\dfrac{1}{2}(\theta^{*T}_{i, n}S_{i, n}+\varepsilon_{i, n})-\vartheta^{*T}_{i, n}\phi_{i, n}, $$

(82)

$$ \frac{\partial V_{i, n}^*}{\partial z_{i, n}} = 2c_{i, n}z_{i, n}+(\theta^{*T}_{i, n}S_{i, n}+\varepsilon_{i, n})+2\vartheta^{*T}_{i, n}\phi_{i, n} $$

The $$ \hat \theta_{a, in} $$ and $$ \hat \theta_{c, in} $$ of the actor FLS and the critic FLS are used to estimate the unknown weights $$ \theta^*_{i, n} $$, respectively, to obtain

(83)

$$ \hat u_{i}=-{c_{i, n}}z_{i, n}-\dfrac{1}{2}\hat \theta^T_{a, in}S_{i, n}-\hat \vartheta^T_{i, n}\phi_{i, n}, $$

(84)

$$ \frac{\partial \hat V_{i, n}}{\partial z_{i, n}} = {2c_{i, n}z_{i, n}}+\hat \theta^T_{c, in}S_{i, n}+2\hat \vartheta_{i, n}\phi_{i, n}. $$

Candidate Lyapunov function is chosen to be the

(85)

$$ V_{i, n}=V_{i, n-1}+\dfrac{1}{2}z^2_{i, n}+\dfrac{1}{2}\tilde\theta_{a, in}^T\tilde\theta_{a, in}+\dfrac{1}{2}\tilde\theta_{c, in}^T\tilde\theta_{c, in}+ \dfrac{1}{2}\tilde\vartheta^T_{i, n}\tilde\vartheta_{i, n}+\dfrac{1}{2}\rho_{i, n}\rho_{i, n}. $$

The time derivative of $$ V_{i, n} $$ is given by

(86)

$$ \begin{align} \dot V_{i, n}&=\dot V_{i, n-1}+ z_{i, n}(\hat u_i+f_{i, n}-\dot{\bar\alpha}_{i, n})+\rho_{i, n}\dot \rho_{i, n}\\ &+\tilde\theta_{a, in}S_{i, n}^TS_{i, n}^T(\sigma_{a, in}(\hat \theta_{a, in}-\hat \theta_{c, in})+\sigma_{c, in}\hat \theta_{c, in}) +\tilde\theta_{c, in}^T\sigma_{c, in}S_{i, n}S_{i, n}^T\hat \theta_{c, in}-\tilde\vartheta^T_{i, n}\dot{\hat\vartheta}_{i, n}. \end{align} $$

By Lemma 2, there exists an FLS satisfying

(87)

$$ F_{i, n}=\vartheta_{i, n}^{*T}\phi_{i, n}+\epsilon_{i, n}, $$

where $$ F_{i, n}=f_{i, n}-\dot {\bar \alpha}_{i, n} $$, and $$ \epsilon_{i, n}\leq \bar\epsilon_{i, n} $$ with $$ \bar\epsilon_{i, n}>0 $$.

Similar to the previous steps, $$ \dot V_{i, n} $$ can be bounded as

(88)

$$ \begin{align} \dot V_{i, n}&\leq -(c_{i, 1}-\frac{7}{4})\zeta_{i}^2-(c_{i, 2}-\frac{7}{4}-\frac{(d_i+b_i)^2\eta_i^2}{2})z_{i, 2}^2-\sum\limits_{m=3}^{n}(c_{i, m}-\frac{9}{4})z_{i, m}^2-\sum\limits_{m=3}^{n}(\dfrac{1}{\kappa_i}-1)\rho_{i, m}^2\\ &-(\dfrac{1}{\kappa_i}-\frac{1}{2}-\frac{(d_i+b_i)^2\eta_i^2}{2})\rho_{i, 2}^2-\sum\limits_{m=1}^{n}\frac{\sigma_{i, m}}{2}\tilde\vartheta_{i, m}^2 -\sum\limits_{m=1}^{n}(\sigma_{a, im}-\frac{\sigma_{c, im}}{2})\lambda^{\min}_{i, m}\tilde\theta_{a, im}^TS_{i, m}S_{i, m}^T\tilde \theta_{a, im}\\ &-\sum\limits_{m=1}^{n}\lambda^{\min}_{i, m}\frac{\sigma_{c, im}}{2}\tilde\theta_{c, im}^T\tilde \theta_{c, im}+\sum\limits_{m=1}^{n}\Upsilon_{i, m}, \end{align} $$

where $$ \Upsilon_{i, n}=\frac{1}{2}\bar\epsilon_{i, n}^2+\frac{\sigma_{i, n}}{2}$$$$\vartheta_{i, n}^2 +\frac{\sigma_{a, in}}{2}\theta_{a, in}^TS_{i, n}S_{i, n}^T $$$$\theta_{a, in}+\frac{\sigma_{c, in}}{2}\theta_{c, in}^TS_{i, n}S_{i, n}^T \theta_{c, in}-(\frac{\sigma_{a, in}}{2}-$$$$\frac{1}{4})\hat\theta_{a, in}^TS_{i, n}S_{i, n}^T\hat \theta_{a, in} -$$$$\frac{\sigma_{c, in}}{2}\hat\theta_{c, in}^TS_{i, n}S_{i, n}^T\hat \theta_{c, in}+\frac{1}{2}M_{i}^2 $$.

Define $$ E_i=\min\{c_{i, 1}-\frac{7}{4}, c_{i, 2}-\frac{7}{4}-\frac{(d_i+b_i)^2\eta_i^2}{2}, c_{i, m}-\frac{9}{4}, \frac{\sigma_{i, m}}{2}, (\sigma_{a, im}-$$$$\frac{\sigma_{c, im}}{2})\lambda^{\min}_{i, m}, \lambda^{\min}_{i, m}\frac{\sigma_{c, im}}{2}\tilde\theta_{c, im}^T\} $$, $$ \Upsilon_i=\sum_{m=1}^{n}\Upsilon_{i, m} $$, $$ \dot V_{i, n} $$ can then be written as

(89)

$$ \dot V_{i, n}\leq -E_iV_{i, n}+\Upsilon_i, $$

Theorem 1: Consider a nonlinear MAS under Assumptions 1-2, with the optimal virtual controller choice of equations (30), (58), and (83), the adaptive laws of equations (14) and (15), the actor FLS choice of equation (12), and the critic FLS choice of (13). Then, we select the parameter $$ \sigma_{a, im}>\frac{\sigma_{c, im}}{2}>\frac{\sigma_{a, im}}{2}>0$$, $$c_{i, 1}>\frac{7}{4}, c_{i, 2}>\frac{7}{4}+\frac{(d_i+b_i)^2\eta_i^2}{2}$$, $$c_{i, m}>\frac{9}{4}, \frac{1}{\kappa_i}>\frac{1}{2}+\frac{(d_i+b_i)^2\eta_i^2}{2}$$, $$\frac{1}{\kappa_i}>1$$. Thus, we can conclude the following results:

(1) All signals in the closed-loop system are bounded.

(2) Consensus tracking error is within predefined bounds.

Proof: The total Liapunov function for all agents is selected to be

(90)

$$ V=\sum\limits_{i=1}^{N} V_{i, n}. $$

The derivative of V with respect to time is

(91)

$$ \dot V\leq-EV+\Upsilon, $$

where $$ E=\min{E_i, i=1, 2, \cdots, N}, \Upsilon=\varSigma_{i=1}^N{\Upsilon_i} $$.

Obviously, it can be inferred that

(92)

$$ 0\leq V(t)\leq e^{-Et}(V(0)-\frac{\Upsilon}{E})+\frac{\Upsilon}{E}. $$

From (91), we know that $$ \zeta_i $$ is bounded by

(93)

$$ \zeta_i\leq \sqrt{\dfrac{2\Upsilon}{E}} $$

This means that the consensus tracking error can be bounded within prescribed bounds, i.e., $$ \Gamma(-\varphi(t))\leq e_i\leq \Gamma(\varphi(t)) $$. Then, we can easily confirm $$ ||y-\bar 1y_r||\leq\frac{||\Gamma(\varphi(t))||}{{\underline\sigma}(\mathscr L+\mathscr B)} $$, where $$ {\underline\sigma}(\mathscr L+\mathscr B) $$ represents the smallest singular value of the matrix $$ \mathscr L+\mathscr B $$.

On the other hand, it is known from equation (92) that $$ \tilde\theta_{a, ij}, \tilde\theta_{c, ij}^T, \tilde\vartheta_{i, j}, \rho_{i, j} $$, and $$ z_{i, j} $$ are bounded. Since $$ \hat \alpha_{i, j} $$ is a function consisting of bounded signals, $$ \hat \alpha_{i, j} $$ is bounded. Based on the definition of $$ \tilde\theta_{a, ij}=\theta_{a, ij}-\hat \theta_{a, ij}, \tilde\theta_{c, ij}=\theta_{c, ij}-\hat \theta_{c, ij}, \tilde\vartheta_{i, j}=\vartheta_{i, j}-\hat \vartheta_{i, j} $$, it is evident that $$ \hat \theta_{a, ij}, \hat \theta_{c, ij} $$, and $$ \hat \vartheta_{i, j} $$ are bounded. Thus, it can be concluded that all signals are bounded.

4. SIMULATION EXAMPLES

Consider the nonlinear MASs with four following agents and a leader, whose dynamics model is represented as

(94)

$$ \begin{align} \left\{ \begin{aligned} \dot x_{i, 1} &= x_{i+1, 2}(t), \\ \dot x_{i, 2} &=u_{i}(t)+f_{i, 2}(\bar{x}_{i.2}(t)), \end{aligned} \right. \end{align} $$

where $$ i=1, 2, 3, 4, f_{i, 2}=0.01\sin(0.5(x_{i, 1}-x_{i, 2})) $$. The reference output trajectory is set to $$ y_r=\sin(0.5t) $$. The communication topology is displayed in Figure 1, through which the Laplace matrix $$ \mathscr L $$ is easily obtained as

$$ \begin{equation*} \mathscr L= \left[ \begin{array}{cccc} 0 & 0 & 0 & 0 \\ -1 & 2 & 0 & -1 \\ -1 & 0 & 1 & 0 \\ 0 & -1 & 0 & 1 \end{array} \right] . \end{equation*} $$

Reinforcement learning-based optimal adaptive fuzzy control for nonlinear multi-agent systems with prescribed performance

Figure 1. The communication topology.

Select the fuzzy membership function as

$$ \begin{align*} \mu_{F^k_{i, j}}&=e^{-\frac{{(x_{i, j}+l)}^2}{2}}, i=1, 2, 3, 4;j=1, 2, \\ k&=9, 7, 5, 3, 1, 0, -1, -3, -5, -7, 9. \end{align*} $$

The time-varying function $$ \omega_i(t) $$ is chosen as $$ \omega_i(t)=\dfrac{1}{(1-l_i)\exp(-1.5t)+l_i} $$. The initial state values are selected as $$ x_{i, 1}(0)=[0.11, 0.09, 0.13, 0.1] $$, $$ x_{i, 2}(0)=[0.11, 0.2, 0.18, 0.11] $$, $$ \hat \theta_{c, i1}(0)=[0.2, 0.2, 0.22, 0.12] $$, $$ \hat \theta_{c, i2}(0)=[0.2, 0.2, 0.12, 0.12] $$, $$ \hat \theta_{a, i1}(0)=[1.4, 1, 1, 1] $$, $$ \hat \theta_{a, i2}(0)=[1.2, 1.2, 1, 1] $$, $$ \hat \vartheta_{i, j}(0)=1 $$, $$ \bar \alpha_{i, j}(0)=0 $$. The design parameters are selected as $$ l_i=0.2 $$, $$ \chi=5 $$, $$ c_{i, j}=9 $$, $$ \kappa_{i, 2}=0.02 $$, $$ \sigma_{c, ij}=7 $$, $$ \sigma_{a, ij}=5 $$, $$ \sigma_{11}=\sigma_{12}=\sigma_{21}=\sigma_{22}=\sigma_{31}=21 $$, and $$ \sigma_{32}=\sigma_{41}=\sigma_{42}=11 $$. The resulting simulations are presented in Figures 2-6. Figure 2 depicts the output trajectories of follower agents and the reference trajectory, demonstrating the guaranteed well-tracking performance under the designed control protocol. Figure 3 shows the consensus tracking errors of agents and performance constraint bounds, from which it can be seen that the constraint has never been violated. Figure 4 provides the designed control protocol. Figure 5 depicts the trajectories of adaptive parameter $$ \theta_{c, ij} $$. Figure 6 portrays the trajectories of adaptive parameters $$ \theta_{a, ij} $$. The trajectories of adaptive parameter $$ \vartheta_{i, j} $$ is plotted in Figure 7. The above leads show that adaptive parameters $$ \theta_{c, ij} $$, $$ \theta_{a, ij} $$, and $$ \vartheta_{i, j} $$ are bounded. Based on the aforementioned results, it is evident that our control objectives have been successfully achieved.

Figure 2. The trajectories of the system output state $$ y_i(i=1, 2, 3, 4) $$ and the reference signal $$ y_r $$.

Figure 3. System consensus tracking error $$e_i(i=1, 2, 3, 4)$$ and constraint functions $$\Gamma(\varphi(t)), \Gamma(-\varphi(t))$$.

Figure 4. The trajectories of the control protocols $$u_i(i=1, 2, 3, 4)$$.

Figure 5. Responses of $$\theta_{c, ij}(i=1, 2, 3, 4;k=1, 2)$$.

Figure 6. Responses of $$\theta_{a, ij}(i=1, 2, 3, 4;k=1, 2)$$.

Figure 7. Responses of $$\vartheta_{i, j}(i=1, 2, 3, 4;k=1, 2)$$.

5. CONCLUSIONS

In this paper, the problem of optimal adaptive consensus tracking control for nonlinear MASs with prescribed performance has been addressed. Firstly, a time-varying scalar function is introduced such that the designed performance function bypasses the initial value conditions. Based on the error transformation function, an unconstrained system is obtained. Subsequently, a RL-based consensus control scheme based on optimal control theory and dynamic surface technique has been proposed. Finally, it is shown that the stability of the closed-loop system and the error constraints are not violated. In practice, the systems are always subject to various uncertain constraints, such as actuator faults and input dead zones, which will have a large impact on the performance of systems. Therefore, designing a properly performance-constrained optimal control scheme considering the above situations is a topic for further research in the future.

DECLARATIONS

Authors' contributions

Made substantial contributions to the conception and design of the study and performed data analysis and interpretation: Yue H

Performed data acquisition and provided administrative, technical, and material support: Xia J

Availability of data and materials

Not applicable.

Financial support and sponsorship

This work was supported by the National Natural Science Foundation of China under Grants 61973148 and by the Discipline with Strong Characteristics of Liaocheng University: Intelligent Science and Technology under Grant 319462208.

Conflicts of interest

All authors declared that there are no conflicts of interest.

Ethical approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Copyright

REFERENCES

1. Modares H, Lewis FL, Naghibi-Sistani MB. Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 2014;50:193-202.

2. Bertsekas DP. Value and policy iterations in optimal control and adaptive dynamic programming. IEEE Trans Neural Netw Learn Syst 2015;28:500-9.

3. Tsai JSH, Li JS, Leang-San S. Discretized quadratic optimal control for continuous-time two-dimensional systems. IEEE Trans Circuits Syst I Fund Theory Appl 2002;49:116-25.

4. Luo B, Liu DR, Wu HN, Wang D, Lewis FL. Policy gradient adaptive dynamic programming for data-based optimal control. IEEE Trans Cybern 2016;47:3341-54.

5. Jiang Y, Jiang ZP. Global adaptive dynamic programming for continuous-time nonlinear systems. IEEE Trans Automat Contr 2015;60:2917-29.

6. Wu X, Chen HL, Wang JJ, Troiano L, Loia V, Fujita H. Adaptive stock trading strategies with deep reinforcement learning methods. Inf Sci 2020;538:142-58.

7. Modares H, Ranatunga I, Lewis FL, Popa DO. Optimized assistive human-robot interaction using reinforcement learning. IEEE Trans Cybern 2015;46:655-67.

8. Wen GX, Chen CLP, Li WN. Simplified optimized control using reinforcement learning algorithm for a class of stochastic nonlinear systems. Inf Sci 2020;517:230-43.

9. Zhao B, Liu DR, Luo CM. Reinforcement learning-based optimal stabilization for unknown nonlinear systems subject to inputs with uncertain constraints. IEEE Trans Neural Netw Learn Syst 2019;31:4330-40.

10. Wen G, Chen CLP, Ge SS, Yang H, Liu X. Optimized adaptive nonlinear tracking control using actor-critic reinforcement learning strategy. IEEE Trans Ind Inf 2019;15:4969-77.

11. Bai W, Zhou Q, Li T, Li H. Adaptive reinforcement learning neural network control for uncertain nonlinear system with input saturation. IEEE Trans Cybern 2019;50:3433-43.

12. Yang X, Liu D, Wang D. Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints. Int J Control 2014;87:553-66.

13. Bai W, Li T, Tong S. NN reinforcement learning adaptive control for a class of nonstrict-feedback discrete-time systems. IEEE Trans Cybern 2020;50:4573-84.

14. Li Y, Zhang J, Liu W, Tong S. Observer-based adaptive optimized control for stochastic nonlinear systems with input and state constraints. IEEE Trans Neural Netw Learn Syst 2021;33:7791-805.

15. Wang J, Gong Q, Huang K, Liu Z, Chen CLP, Liu J. Event-triggered prescribed settling time consensus compensation control for a class of uncertain nonlinear systems with actuator failures. IEEE Trans Neural Netw Learn Syst 2023;34:5590-600.

16. Wang J, Wang C, Liu Z, Chen CLP, Zhang C. Practical fixed-time adaptive ERBFNNs event-triggered control for uncertain nonlinear systems with dead-zone constraint. IEEE Trans Syst Man Cybern Syst 2023:1-10.

17. Cai X, de Marcio M. Adaptive rigidity-based formation control for multirobotic vehicles with dynamics. IEEE Trans Contr Syst Technol 2014;23:389-96.

18. Ren H, Cheng Z, Qin J, Lu R. Deception attacks on event-triggered distributed consensus estimation for nonlinear systems. Automatica 2023;154:111100.

19. Wang J, Yan Y, Liu Z, Chen CLP, Zhang C, Chen K. Finite-time consensus control for multi-agent systems with full-state constraints and actuator failures. Neural Netw 2023;157:350-63.

20. Shang Y. Matrix-scaled consensus on weighted networks with state constraints. IEEE Syst J 2023;17:6472-9.

21. Cheng L, Hou ZG, Tan M, Lin Y, Zhang W. Neural-network-based adaptive leader-following control for multiagent systems with uncertainties. IEEE Trans Neural Netw 2010;21:1351-8.

22. Shen Q, Shi P, Zhu J, Wang S, Shi Y. Neural networks-based distributed adaptive control of nonlinear multiagent systems. IEEE Trans Neural Netw Learn Syst 2019;31:1010-21.

23. Zhang N, Xia J, Park JH, Zhang J, Shen H. Improved disturbance observer-based fixed-time adaptive neural network consensus tracking for nonlinear multi-agent systems. Neural Netw 2023;162:490-501.

24. Zhang Y, Sun J, Liang H, Li H. Event-triggered adaptive tracking control for multiagent systems with unknown disturbances. IEEE Trans Cybern 2018;50:890-901.

25. Chen J, Li J, Yuan X. Global fuzzy adaptive consensus control of unknown nonlinear multiagent systems. IEEE Trans Fuzzy Syst 2020;32:2239-50.

26. Zhang J, Liu S, Zhang X, Xia J. Event-triggered-based distributed consensus tracking for nonlinear multiagent systems with quantization. IEEE Trans Neural Netw Learn Syst 2022:1-11.

27. Deng C, Wen C, Wang W, Li X, Yue D. Distributed adaptive tracking control for high-order nonlinear multiagent systems over event-triggered communication. IEEE Trans Automat Contr 2022;68:1176-83.

28. Shao J, Shi L, Cheng Y, Li T. Asynchronous tracking control of leader--follower multiagent systems with input uncertainties over switching signed digraphs. IEEE Trans Cybern 2021;52:6379-90.

29. Yang Y, Xiao Y, Li T. Attacks on formation control for multiagent systems. IEEE Trans Cybern 2021;52:12805-17.

30. Ren H, Wang Y, Liu M, Li H. An optimal estimation framework of multi-agent systems with random transport protocol. IEEE Trans Signal Process 2022;70:2548-59.

31. Gao W, Jiang ZP, Lewis FL, Wang Y. Leader-to-formation stability of multiagent systems): An adaptive optimal control approach. IEEE Trans Automat Contr 2018;63:3581-87.

32. Tan M, Liu Z, Chen CLP, Zhang Y, Wu Z. Optimized adaptive consensus tracking control for uncertain nonlinear multiagent systems using a new event-triggered communication mechanism. Inf Sci 2022;605:301-16.

33. Wen G, Chen CLP. Optimized backstepping consensus control using reinforcement learning for a class of nonlinear strict-feedback-dynamic multi-agent systems. IEEE Trans Neural Netw Learn Syst 2023;34:1524-36.

34. Zhu HY, Li YX, Tong S. Dynamic event-triggered reinforcement learning control of stochastic nonlinear systems. IEEE Trans Fuzzy Syst 2023;31:2917-28.

35. Bai W, Li T, Long Y, Chen CLP. Event-triggered multigradient recursive reinforcement learning tracking control for multiagent systems. IEEE Trans Neural Netw Learn Syst 2023;34:366-79.

36. Li T, Bai W, Liu Q, Long Y, Chen CLP. Distributed fault-tolerant containment control protocols for the discrete-time multi-agent systems via reinforcement learning method. IEEE Trans Neural Netw Learn Syst 2023;34:3979-91.

37. Zhao Y, Niu B, Zong G, Zhao X, Alharbi KH. Neural network-based adaptive optimal containment control for non-affine nonlinear multi-agent systems within an identifier-actor-critic framework. J Franklin Inst 2023;360:8118-43.

38. Li H, Wu Y, Chen M, Lu R. Adaptive multigradient recursive reinforcement learning event-triggered tracking control for multiagent systems. IEEE Trans Neural Netw Learn Syst 2023;34:144-56.

39. Bechlioulis CP, Rovithakis GA. Robust adaptive control of feedback linearizable MIMO nonlinear systems with prescribed performance. IEEE Trans Automat Contr 2008;53:2090-9.

40. Wang X, Xia J, Park JH, Xie X, Chen G. Intelligent control of performance constrained switched nonlinear systems with random noises and its application): an event-driven approach. IEEE Trans Circuits Syst I Regul Pap 2022;69:3736-47.

41. Li Y, Shao X, Tong S. Adaptive fuzzy prescribed performance control of nontriangular structure nonlinear systems. IEEE Trans Fuzzy Syst 2019;28:2416-26.

42. Wang W, Liang H, Pan Y, Li T. Prescribed performance adaptive fuzzy containment control for nonlinear multiagent systems using disturbance observer. IEEE Trans Cybern 2020;50:3879-91.

43. Sun K, Qiu J, Karimi HR, Fu Y. Event-triggered robust fuzzy adaptive finite-time control of nonlinear systems with prescribed performance. IEEE Trans Fuzzy Syst 2020;29:1460-71.

44. Chen H, Yan H, Wang Y, Xie S, Zhang D. Reinforcement learning-based close formation control for underactuated surface vehicle with prescribed performance and time-varying state constraints. Ocean Eng 2022;256:111361.

45. Wang N, Gao Y, Zhang X. Data-driven performance-prescribed reinforcement learning control of an unmanned surface vehicle. IEEE Trans Neural Netw Learn Syst 2021;32:5456-67.

Cite This Article

Research Article

Open Access

Reinforcement learning-based optimal adaptive fuzzy control for nonlinear multi-agent systems with prescribed performance

Huarong Yue, Jianwei Xia

How to Cite

Download Citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click on download.

Export Citation File:

RIS BibTeX EndNote

Type of Import

Direct Import Indirect Import

Tips on Downloading Citation

This feature enables you to download the bibliographic information (also called citation data, header data, or metadata) for the articles on our site.

Citation Manager File Format

Use the radio buttons to choose how to format the bibliographic data you're harvesting. Several citation manager formats are available, including EndNote and BibTex.

Type of Import

If you have citation management software installed on your computer your Web browser should be able to import metadata directly into your reference database.

Direct Import: When the Direct Import option is selected (the default state), a dialogue box will give you the option to Save or Open the downloaded citation data. Choosing Open will either launch your citation manager or give you a choice of applications with which to use the metadata. The Save option saves the file locally for later use.

Indirect Import: When the Indirect Import option is selected, the metadata is displayed and may be copied and pasted as needed.

About This Article

Copyright

© The Author(s) 2023. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, sharing, adaptation, distribution and reproduction in any medium or format, for any purpose, even commercially, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Data & Comments

Data

Views

935

Downloads

541

Citations

3

Comments

0

8

Comments

Comments must be written in English. Spam, offensive content, impersonation, and private information will not be permitted. If any comment is reported and identified as inappropriate content by OAE staff, the comment will be removed without notice. If you have any queries or need any help, please contact us at [email protected].