Decentralized tracking control design based on intelligent critic for an interconnected spring-mass-damper system

Wenqian Fan; Ao Liu; Ding Wang

doi:10.20517/ces.2023.04

Download PDF

Research Article | Open Access | 29 Mar 2023

Decentralized tracking control design based on intelligent critic for an interconnected spring-mass-damper system

Views: 1662 | Downloads: 1111 | Cited:

2

Wenqian Fan^1,2,3,4

,

Ao Liu^1,2,3,4

,

Ding Wang^1,2,3,4

Complex Eng Syst 2023;3:5.

10.20517/ces.2023.04 | © The Author(s) 2023.

Author Information

Article Notes

Cite This Article

Abstract

In this paper, the decentralized tracking control (DTC) problem is investigated for a class of continuous-time nonlinear systems with external disturbances. First, the DTC problem is resolved by converting it into the optimal tracking controller design for augmented tracking isolated subsystems (ATISs). %It is investigated in the form of the nominal system. A cost function with a discount is taken into consideration. Then, in the case of external disturbances, the DTC scheme is effectively constructed via adding the appropriate feedback gain to each ATIS. %Herein, we aim to obtain the optimal control strategy for minimizing the cost function with discount. In addition, utilizing the approximation property of the neural network, the critic network is constructed to solve the Hamilton-Jacobi-Isaacs equation, which can derive the optimal tracking control law and the worst disturbance law. Moreover, the updating rule is improved during the process of weight learning, which removes the requirement for initial admission control. Finally, through the interconnected spring-mass-damper system, a simulation example is given to verify the availability of the DTC scheme.

Keywords

Adaptive dynamic programming, discounted cost function, decentralized tracking control, disturbance rejection, interconnected spring-mass-damper systems, neural networks, optimal control

Download PDF 0 1

1. INTRODUCTION

For large-scale nonlinear interconnected systems, which are considered as nonlinear plants consisting of many interconnected subsystems, decentralized control has become a research hotspot in the last few decades ^[1–4]. Compared with the centralized control, the decentralized control has the advantages of simplifying the structure and reducing the computation burden of the controller. Besides, the local controller only depends on the information of the local subsystem. Meanwhile, with the development of science and technology, interconnected engineering applications have become increasingly complex, such as robotic systems ^[5] and power systems ^{[6, 7]}. In ^[8–10], we found that the decentralized control of the large-scale system was connected with the optimal control of the isolated subsystems, which means the optimal control method ^[11–14] can be adopted to achieve the design purpose of the decentralized controllers. However, the optimal control of the nonlinear system often needs to solve the Hamilton-Jacobi-Bellman (HJB) or Hamilton-Jacobi-Isaacs (HJI) equation, which can be solved by using the adaptive dynamic programming (ADP) method ^{[15, 16]}. Besides, in ^[13], Wang et al. investigated the latest intelligent critic framework for advanced optimal control. In ^[14], the optimal feedback stabilization problem was discussed with discounted guaranteed cost for nonlinear systems. It follows that the interconnection plays a significant role in designing the controller. Hence, it can be classified as decentralized and distributed control schemes. There is a certain distinction between decentralized control and distributed control. For decentralized control, each sub-controller only uses local information and the interconnection among subsystems can be assumed to be weak in nature. Compared with the decentralized control, the distributed control ^[17–19] can be introduced to improve the performance of the subsystems when the interconnections among subsystems become strong. In ^[20], the distributed optimal observer was devised to assess the nonlinear leader state for all followers. In ^[21], the distributed control was developed by means of online reinforcement learning for interconnected systems with exploration.

It is worth mentioning that the ADP algorithm has been extensively employed for dealing with various optimal regulation problems and tracking problems ^[22–24], which will achieve the goal, that is, the actual signal can track the reference signal under the noisy and the uncertain environment. In ^[25], Ha et al. proposed a novel cost function to explore the evaluation framework of the optimal tracking control problem. Then, aimed at complicated control systems, it is necessary to consider decentralized tracking control (DTC) problems ^[26–29]. The DTC systems can be transformed into the the nominal augmented tracking isolated subsystems (ATISs), which are composed of the tracking error and the reference signal. In ^[26], Qu et al. proposed a novel formulation consisting of a steady-state controller and a modified optimal feedback controller of the DTC strategy. Besides, the asymptotic DTC was realized by introducing two integral bounded functions in ^[27]. In ^[28], Liu et al. proposed a finite-time DTC method for a class of nonstrict feedback interconnected systems with disturbances. Moreover, the adaptive fuzzy output-feedback DTC design was investigated for switched large-scale systems in ^[29].

Game theory is a discipline that implements corresponding strategies. It contains cooperative and noncooperative types, that is, zero-sum (ZS) games and non-ZS games. In particular, ZS games have been widely applied in many fields ^[30–33]. The object of the ZS game is to derive the Nash equilibrium of nonliner systems, which makes the cost function optimized. In ^[31], the finite-horizon H-infinity state estimator design was studied for periodic neural networks over multiple fading channels. The noncooperative control problem was formulated as a two-player ZS game in ^[32]. In ^[33], Wang et al. investigated the stability of the general value iteration algorithm for ZS games. At the same time, we can also combine the ZS problem with the tracking problem to make the system more stable while achieving the trajectory tracking. In ^[34], Zhang et al. developed an online model-free integral reinforcement learning algorithm for solving the H-infinity optimal tracking problem for completely unknown systems. In ^[35], a general bounded $$ L_{2} $$ gain tracking policy was introduced with a discounted function. In ^[36], Hou et al. proposed an action-disturbance-critic neural network frame to realize the iterative dual heuristic programming algorithm.

As can be seen from the above, there are few studies that combine the DTC problem with the ZS game problem. It is necessary to take the related discounted cost function into account for the DTC system, which can transform the DTC problem into an optimal control problem with disturbances. In practice, the existence of disturbances will make an unpredictable impact on the plant. Hence, it is of vital importance to consider the stability of the DTC system. In the experimental simulation, it is a challenge to achieve the goal of effective online weight training, which is implemented under the tracking control law and the disturbance control law. Consequently, in this paper, we put forward a novel method in view of ADP to resolve the DTC problem with external disturbances for continuous-time (CT) nonlinear systems. More importantly, for the sake of overcoming the difficulty of selecting initial admissible control policies, an additional term is added during the weight updating process. Remarkably, in this paper, we introduce the discount factor for maximizing and minimizing the corresponding cost function.

The contributions of this paper are as follows: First, considering the disturbance input in the DTC system, the strategy feasibility and the system stability are discussed through theoretical proofs. It is worth noting that the discount factor is introduced to the cost function. Moreover, in the process of online weight training, we can make the DTC system reach a stable state without selecting the initial admissible control law. Additionally, we present the experimental process of the spring-mass-damper system. Besides, we derive the desired tracking error curves as well as control strategy curves, which demonstrates that they are uniformly ultimately bounded (UUB).

The whole paper is divided into six sections. The first section is the introduction of relevant background knowledges of the research content. The second section is the problem statement of basic problems about the two person ZS game and the DTC strategy. In the third section, we design the decentralized tracking controller by using the optimal control method through solving the HJI equations. Meanwhile, the relevant lemma and theorem are given to validate the establishment of the DTC strategy. In the fourth section, the design method in accordance with adaptive critic is elaborated. Most importantly, an improved critic learning rule is implemented via critic networks. In the fifth section, the practicability of this method is validated by an interconnected spring-mass-damper system. Finally, the sixth section displays conclusions and summarizes overall research content of the whole paper.

2. PROBLEM STATEMENT

Consider a CT nonlinear interconnected system with disturbances, which is composed of $$ N $$ interconnected subsystems. Its dynamic description can be expressed as

(1)

$$ \dot{x}_{i}(t)= f_{i}(x_{i}(t))+g_{i}(x_{i}(t))\left(\bar{u}_{i} (x_{i}(t))+\bar{Z}_{i}(x(t))\right) +h_{i}(x_{i}(t))v_{i}(x_{i}(t)), $$

where $$ i={1, 2, \dotsc, N} $$, $$ x_{i}(t)\in{\mathbb R}^{n_{i}} $$ is the state vector of the $$ i $$th subsystem and $$ x(t) $$ denotes the partial interconnected state related to other subsystems of the large-scale system. $$ \bar{u}_{i} (x_{i}(t))\in{\mathbb R}^{m_{i}} $$ is the control input and $$ v_{i}(x_{i}(t))\in{\mathbb R}^{q_{i}} $$ is the external disturbance input. As for the $$ i $$th subsystem, we denote $$ f_{i}(x_{i}(t)) $$, $$ g_{i}(x_{i}(t)) $$, $$ h_{i}(x_{i}(t)) $$, and $$ \bar{Z}_{i}(x(t)) $$ as the nonlinear internal dynamics, the input gain matrix, the disturbance gain matrix, and the interconnected item in sequence. Besides, $$ {[{x}^{\mathit{T}}_{1}, {x}^{\mathit{T}}_{2}, \dotsc, {x}^{\mathit{T}}_{N}]}^{\mathit{T}}\in{\mathbb R}^{n} $$ denotes the whole state of the large-scale system Equation (1), where $$ n=\sum_{i=1}^{N}n_{i} $$. Accordingly, $$ {x_{1}, x_{2}, \dotsc, x_{N}} $$ are named local states and $$ \bar{u}_{1}(x_{1}), \bar{u}_{2}(x_{2}), \dotsc, \bar{u}_{N}(x_{N}) $$ are called local controllers. We let $$ R_{i}\in{\mathbb R}^{m_{i}\times m_{i}} $$ be the symmetric positive definite matrix and denote $$ Z_{i}(x(t))={R_{i}}^{{1}/{2}}\bar{Z}_{i}(x(t)) $$. In addition, $$ {Z}_{i}(x(t))\in{\mathbb R}^{m_{i}} $$ is bounded as follows:

(2)

$$ {\|Z_{i}(x(t))\|}\leq \sum\limits_{j=1}^{N}\alpha_{ij}\theta_{ij}(x_{j}) \leq \sum\limits_{j=1}^{N}\beta_{ij}\theta_{j}(x_{j}), $$

where $$ j={1, 2, \dotsc, N} $$, $$ \alpha_{ij} $$ is the nonnegative constant, $$ \theta_{ij}(x_{j}) $$ is the positive semidefinite function. Besides, we define $$ \theta_{j}(x_{j})=\max{\{\theta_{1j}(x_{j}), \theta_{2j}(x_{j}), \dotsc, \theta_{Nj}(x_{j})\}} $$ and the element of $$ {\{\theta_{1j}(x_{j}), \theta_{2j}(x_{j}), \dotsc, \theta_{Nj}(x_{j})\}} $$ will not reach zero at the same time. For this reason, $$ \beta_{ij}\geq \alpha_{ij}\theta_{ij}(x_{j})/\theta_{j}(x_{j}) $$ holds, where $$ \beta_{ij} $$ is also the nonnegative constant.

In this paper, considering the nonlinear system Equation (1), a reference system is introduced as follows:

(3)

$$ \dot{r}_{i}(t)=\zeta_{i}(r_{i}(t)), $$

where $$ r_{i}(t)\in{\mathbb R}^{n_{i}} $$ denotes the desired trajectory with $$ r_{i}(0) = r_{i0} $$, the function $$ \zeta_{i} $$ is locally Lipschitz continuous satisfying $$ \zeta_{i}(0)=0 $$. For the $$ i $$th subsystem, the trajectory tracking error can be defined as $$ e_{i}(t)=x_{i}(t)-r_{i}(t) $$ with $$ e_{i}(0) = e_{i0} $$. Thus, the dynamics of the tracking error is

(4)

$$ \dot{e}_{i}(t)= f_{i}(x_{i}(t))+g_{i}(x_{i}(t))\left(\bar{u}_{i} (x_{i}(t))+\bar{Z}_{i}(x(t))\right) +h_{i}(x_{i}(t))v_{i}(x_{i}(t))-\zeta_{i}(r_{i}(t)). $$

Noticing $$ x_{i}(t)=e_{i}(t)+r_{i}(t) $$, we define the augmented subsystem states as $$ y_{i}(t)=[{e}^{\mathit{T}}_{i}(t), {r}^{\mathit{T}}_{i}(t)]^{\mathit{T}}\in{\mathbb R}^{2n_{i}} $$ with $$ y_{i}(0)=y_{i0}=[{e}^{\mathit{T}}_{i0}, {r}^{\mathit{T}}_{i0}]^{\mathit{T}} $$. Hence, the dynamic of the $$ i $$th ATIS based on Equations (1) and (3) can be formulated as a concise form

(5)

$$ \dot{y}_{i}(t)=\ \mathcal{F}_{i}(y_{i}(t))+\mathcal{G}_{i}(y_{i}(t))\left(\bar{u}_{i} (y_{i}(t))+\bar{Z}_{i}(y(t))\right) +\mathcal{H}_{i}(y_{i}(t))v_{i}(y_{i}(t)), $$

where $$ \mathcal{F}_{i}(y_{i}(t))\in{\mathbb R}^{2n_{i}} $$, $$ \mathcal{G}_{i}(y_{i}(t))\in{\mathbb R}^{2n_{i}\times m_{i}} $$, and $$ \mathcal{H}_{i}(y_{i}(t))\in{\mathbb R}^{2n_{i}\times q_{i}} $$ respectively. Specifically, they can be expressed as

(6)

$$ \mathcal{F}_{i}(y_{i}(t))=\ \left[\begin{array}{c} f_{i}(e_{i}(t)+r_{i}(t))-\zeta_{i}(r_{i}(t)) \\ \zeta_{i}(r_{i}(t))\end{array}\right], $$

(7)

$$ \mathcal{G}_{i}(y_{i}(t))=\ \left[\begin{array}{c} g_{i}(e_{i}(t)+r_{i}(t)) \\ 0_{n_{i}\times m_{i}}\end{array}\right], $$

(8)

$$ \mathcal{H}_{i}(y_{i}(t))=\ \left[\begin{array}{c} h_{i}(e_{i}(t)+r_{i}(t)) \\ 0_{n_{i}\times q_{i}}\end{array}\right]. $$

We aim to design a pair of decentralized control policies $$ \bar{u}_{1}, \bar{u}_{2}, \dotsc, \bar{u}_{N} $$ to ensure that large-scale system Equation (1) can track the desired object while being restricted by external disturbances. It means that as $$ t\to +\infty $$, $$ \|x_{i}(t)-r_{i}(t)\|\to 0 $$. Meanwhile, it is noteworthy that the control pair $$ \bar{u}_{1}, \bar{u}_{2}, \dotsc, \bar{u}_{N} $$ should be pointed out only as a corresponding controller with the local information. In what follows, it presents the DTC problem by transforming it into the optimal controller design of ATISs by considering an appropriate discounted cost function.

3. DTC DESIGN VIA OPTIMAL REGULATION

3.1. Optimal control and the HJI equations

In this section, the optimal DTC strategy of the ATIS with the disturbance rejection is elaborated. It is addressed by solving the HJI equation with a discounted cost function. Then, we consider the nominal part of the augmented system Equation (5) as

(9)

$$ \dot{y}_{i}(t)= \mathcal{F}_{i}(y_{i}(t))+\mathcal{G}_{i}(y_{i}(t)){u}_{i}(y_{i}(t)) +\mathcal{H}_{i}(y_{i}(t))v_{i}(y_{i}(t)). $$

We assume that $$ \mathcal{F}_{i} +\mathcal{G}_{i}u_{i}+\mathcal{H}_{i}v_{i} $$ is Lipschitz continuous on a set $$ \Omega_{i}\subset{\mathbb R}^{2n_{i}} $$, which is commonly used in the field of adaptive critic control to ensure the existence and uniqueness of the solution for the differential equation. Related to the $$ i $$th ATIS, we manage to minimize and maximize the discounted cost function as

(10)

$$ J_{i}(y_{i0})=\int_{0}^{\infty}e^{-\lambda_{i}(\tau-t)}\left\{{y}^{\mathit{T}}_{i}(\tau)Q_{i}y_{i}(\tau)+{u}^{\mathit{T}}_{i}(y_{i}(\tau))R_{i}u_{i}(y_{i}(\tau)) -{\varrho^2_{i}}{v}^{\mathit{T}}_{i}(y_{i}(\tau))v_{i}(y_{i}(\tau))\right\}\text{d}{\tau}, $$

where $$ Q_{i}\in{\mathbb R}^{2n_{i}\times2n_{i}} $$, $$ R_{i}\in{\mathbb R}^{m_{i}\times m_{i}} $$ are both positive definite matrices. Herein, we let $$ {y}^{\mathit{T}}_{i}Q_{i}y_{i}-{\varrho^2_{i}}{v}^{\mathit{T}}_{i}(y_{i})v_{i}(y_{i})=\gamma^{2}_{i}(y_{i}) $$ and $$ \theta_{i}(y_{i})\leq \sqrt{\gamma_{i}^{2}(y_{i})-\lambda_{i}J_{i}(y_{i})} $$, where $$ \gamma_{i}^{2}(y_{i})>\lambda_{i}J_{i}(y_{i}) $$. It is worth noting that this inequality is employed to prove the feasibility of Theorem 1. Then, Equation (10) can be equivalent to

(11)

$$ J_{i}(y_{i0})= \int_{0}^{\infty}e^{-\lambda_{i}(\tau-t)}\left\{\gamma^{2}_{i}(y_{i})+{u}^{\mathit{T}}_{i}(y_{i}(\tau))R_{i}u_{i}(y_{i}(\tau))\right\}\text{d}{\tau}. $$

If Equation (11) is continuously differentiable, the nonlinear Lyapunov equation is the infinitely small form of Equation (11). The Lyapunov equation is as follows:

(12)

$$ \gamma^{2}_{i}(y_{i})+{u}^{\mathit{T}}_{i}(y_{i})R_{i}u_{i}(y_{i})-\lambda_{i}J_{i}(y_{i})+(\nabla J_{i}(y_{i}))^{\mathit{T}}[\mathcal{F}_{i}(y_{i})+\mathcal{G}_{i}(y_{i}) {u}_{i}(y_{i})+\mathcal{H}_{i}(y_{i})v_{i}(y_{i})]=0. $$

Define the Hamiltonian of the ith ATIS for the optimization problem as

(13)

$$ \begin{align} H_{i}(y_{i}, u_{i}, v_{i}, \nabla J_{i}(y_{i})) =\ &\gamma^{2}_{i}(y_{i})+{u}^{\mathit{T}}_{i}(y_{i})R_{i}u_{i}(y_{i})-\lambda_{i}J_{i}(y_{i})+(\nabla J_{i}(y_{i}))^{\mathit{T}}\\ &\times[\mathcal{F}_{i}(y_{i})+\mathcal{G}_{i}(y_{i}){u}_{i}(y_{i})+\mathcal{H}_{i}(y_{i})v_{i}(y_{i})]. \end{align} $$

To acquire the saddle point solution $$\{ u_{i}^* $$, $$ v_{i}^* \}$$, the local optimal cost function need to satisfy the following Nash condition

(14)

$$ {J_{i}}^*(y_{i0})=\min\limits_{u_{i}}\max\limits_{v_{i}}J_{i}(y_{i0}). $$

Then, the optimal cost function $$ J_{i}^*(y_{i}) $$ is derived via solving the local HJI equation in the following:

(15)

$$ \min\limits_{u_{i}}\max\limits_{v_{i}}H_{i}(y_{i}, u_{i}, v_{i}, \nabla J_{i}^*(y_{i}))=0. $$

Due to the saddle point solution $$\{ u_{i}^* $$, $$ v_{i}^* \}$$ satisfies the extremum theorem, the optimal tracking control law and the worst disturbance law can be computed by

(16)

$$ u^*_{i}(y_{i})=-\frac{1}{2}R_{i}^{-1}{\mathcal{G}}^{\mathit{T}}_{i}(y_{i})\nabla J_{i}^*(y_{i}), $$

(17)

$$ v_{i}^*(y_{i})=\frac{1}{2{\varrho^2_{i}}}{\mathcal{H}}^{\mathit{T}}_{i}(y_{i})\nabla J_{i}^*(y_{i}). $$

Substituting the optimal tracking control strategy Equation (16) into Equation (15), the HJI equation for the $$ i $$th ATIS becomes

(18)

$$ (\nabla J_{i}^*(y_{i}))^{\mathit{T}}[\mathcal{F}_{i}(y_{i})+\mathcal{H}_{i}(y_{i})v_{i}^*(y_{i})]+\gamma^{2}_{i}(y_{i})-\lambda_{i}J_{i}^*(y_{i})-\frac{1}{4} (\nabla J_{i}^*(y_{i}))^{\mathit{T}}\mathcal{G}_{i}(y_{i})R_{i}^{-1}\mathcal{G}^{\mathit{T}}_{i}(y_{i})\nabla J_{i}^*(y_{i})=0. $$

3.2. Establishment of the DTC strategy design

In the following, we present the DTC strategy by adding the feedback gain to the interconnected system Equation (5). Herein, the following lemma is given by

Lemma 1 Considering the ATIS Equation (9), the feedback control

(19)

$$ \bar{u}_{i}(y_{i})=k_{i}u_{i}^*(y_{i}) $$

can ensure the $$ N $$ ATISs are asymptotically stable as long as $$ k_{i}\geq1/2 $$, which makes the tracking error approach to zero.

Proof. The lemma can be proved by showing $$ J_{i}^*(y_{i}) $$ is a candidate Lyapunov function. We can find $$ J_{i}^*(y_{i})\ge0 $$ in Equation (11), which implies that $$ J_{i}^*(y_{i}) $$ is a positive definite function. The derivative of $$ J_{i}^*(y_{i}) $$ along with the $$ i $$th ATIS is given by

(20)

$$ \begin{align} \dot{J}_{i}^*(y_{i})=&\ (\nabla J_{i}^*(y_{i}))^{\mathit{T}}\dot{y}_{i}\\ =&\ (\nabla J_{i}^*(y_{i}))^{\mathit{T}}[\mathcal{F}_{i}(y_{i})+\mathcal{G}_{i}(y_{i})\bar{u}_{i}(y_{i})+\mathcal{H}_{i}(y_{i})v_{i}(y_{i})]. \end{align} $$

Substituting Equations (18) and (19) into Equation (20), we can rewrite it as

(21)

$$ \begin{align} \dot{J}_{i}^*(y_{i})=&\ -\gamma^{2}_{i}(y_{i})+\lambda_{i}J_{i}^*(y_{i})+\frac{1}{4}(\nabla J_{i}^*(y_{i}))^{\mathit{T}}\mathcal{G}_{i}(y_{i})R_{i}^{-1}\mathcal{G}^{\mathit{T}}_{i}(y_{i})\nabla J_{i}^*(y_{i})\\ &\ -\frac{1}{2}k_{i}(\nabla J_{i}^*(y_{i}))^{\mathit{T}}\mathcal{G}_{i}(y_{i})R_{i}^{-1}{\mathcal{G}}^{\mathit{T}}_{i}(y_{i})\nabla J_{i}^*(y_{i})\\ =&\ -(\gamma^{2}_{i}(y_{i})-\lambda_{i}J_{i}^*(y_{i}))-\left(\frac{1}{2}k_{i}-\frac{1}{4}\right)\left\|R_{i}^{-\frac{1}{2}}{\mathcal{G}}^{\mathit{T}}_{i}(y_{i})\nabla J_{i}^*(y_{i})\right\|^{2}. \end{align} $$

Observing Equation (21), we can obtain that $$ \dot{J}_{i}^*(y_{i})<0 $$ holds under the condition $$ \gamma^{2}_{i}(y_{i})>\lambda_{i}J_{i}^*(y_{i}) $$ for all $$ k_{i}\geq1/2 $$ and $$ y_{i}\neq0 $$. Thus, the conditions are satisfied for Lyapunov local stability theory and the actual state of each ATIS can realize desired tracking objectives under the feedback control strategy. The proof is completed.

Remark 1.It is worth mentioning that only when $$ k_{i}= 1 $$, the feedback control is optimal. Then, we will show the following theorem to verify the proposed control law can effectively establish the DTC strategy.

Theorem 1 Taking Equation (2) and the interconnected augmented tracking system Equation (5) into account, there exist $$ N $$ positive numbers $$ {k}_{i}^* $$, such that, for any $$ {k}_{i}>{k}_{i}^* $$, the feedback control polices given by Equation (19) guarantee that the interconnected tracking system can maintain the asymptotic stability. In other words, the control pair $$ \bar{u}_{1}(y_{1}), \bar{u}_{2}(y_{2}), \dotsc, \bar{u}_{N}(y_{N}) $$ is the DTC strategy for the large-scale system.

Proof. Inspired by Lemma 1, we observe that $$ J_{i}^*(y_{i}) $$ is the Lyapunov function. Therefore, a composite Lyapunov function of $$ {J}_{i}^*(y_{i}) $$ is chosen as

(22)

$$ \mathcal{L}(y)=\sum\limits_{i=1}^{N}{\mu_{i}}J_{i}^*(y_{i}), $$

where $$ {\mu_{i}} $$ is a random positive constant. Taking the time derivative of $$ \mathcal{L}(y) $$, we have

(23)

$$ \begin{align} \dot{\mathcal{L}}(y)=&\ \sum\limits_{i=1}^{N}{\mu_{i}}\dot{J}_{i}^*(y_{i})\\ =&\ \sum\limits_{i=1}^{N}{\mu_{i}} \Big\{(\nabla J_{i}^*(y_{i}))^{\mathit{T}} [\mathcal{F}_{i}(y_{i})+ \mathcal{G}_{i}(y_{i})\bar{u}_{i}(y_{i}) +\mathcal{H}_{i}(y_{i})v_{i}(y_{i})]\\ &\ +(\nabla J_{i}^*(y_{i}))^{\mathit{T}}\mathcal{G}_{i}(y_{i})\bar{Z}_{i}(y)\Big\}. \end{align} $$

Considering Equation (2), the mentioned inequality $$ \theta_{i}(y_{i})\leq \sqrt{\gamma_{i}^{2}(y_{i})-\lambda_{i}J_{i}(y_{i})} $$, where $$ \gamma_{i}^{2}(y_{i})>\lambda_{i}J_{i}(y_{i}) $$, and Equation (21), the upper formula can be converted to

(24)

$$ \begin{align} \dot{\mathcal{L}}(y)\leq&\ -\sum\limits_{i=1}^{N}{\mu_{i}}\bigg\{\gamma^{2}_{i}(y_{i})-\lambda_{i}J_{i}(y_{i})+\left(\frac{1}{2}k_{i}-\frac{1}{4}\right) \left\|{(\nabla J_{i}^*(y_{i}))}^{\mathit{T}}\mathcal{G}_{i}(y_{i})R_{i}^{-\frac{1}{2}}\right\|^{2}\bigg.\\ &\ \bigg.-\left\|{(\nabla J_{i}^*(y_{i}))}^{\mathit{T}}\mathcal{G}_{i}(y_{i})R_{i}^{-\frac{1}{2}}\right\| \sum\limits_{i=1}^{N}\beta_{ij}\sqrt{\gamma_{i}^{2}(y_{i})-\lambda_{i}J_{i}(y_{i})}\bigg\}. \end{align} $$

Herein, in order to transform Equation (24) to the compact form, we denote

(25)

$$ {M}=\text{diag}\{\mu_{1}, \mu_{2}, \dots, \mu_{N}\}, $$

(26)

$$ {K}=\text{diag}\left\{\frac{1}{2}k_{1}-\frac{1}{4}, \frac{1}{2}k_{2}-\frac{1}{4}, \dots, \frac{1}{2}k_{N}-\frac{1}{4}\right\}, $$

(27)

$$ {B}=\left[\begin{array}{c} \beta_{11}\ \beta_{12}\ \dots\ \beta_{1N}\\ \beta_{21}\ \beta_{22}\ \dots\ \beta_{2N}\\ \vdots\ \ \ \ \vdots\ \ \ \ \ddots\ \ \ \ \vdots\ \\ \beta_{N1} \beta_{N2}\ \dots\ \beta_{NN} \end{array}\right]. $$

Therefore, we introduce a 2$$ N $$-dimensional column vector $$ \vartheta $$, which consists of the $$ N $$-dimensional column vector $$ \sqrt{\gamma_{i}^{2}(y_{i})-\lambda_{i}J_{i}(y_{i})} $$ and the $$ N $$-dimensional column vector $$ \left\|{(\nabla J_{i}^*(y_{i}))}^{\mathit{T}}\mathcal{G}_{i}(y_{i})R_{i}^{-\frac{1}{2}}\right\| $$. Its form is as follows:

(28)

$$ \vartheta=\left[\begin{array}{c} \sqrt{\gamma_{1}^{2}(y_{1})-\lambda_{1}J_{1}(y_{1})}\\ \sqrt{\gamma_{2}^{2}(y_{2})-\lambda_{2}J_{2}(y_{2})}\\ \vdots\\ \sqrt{\gamma_{N}^{2}(y_{N})-\lambda_{N}J_{N}(y_{N})}\\ \\ \hdashline \\ \left\|{(\nabla J_{1}^*(y_{1}))}^{\mathit{T}}\mathcal{G}_{1}(y_{1})R_{1}^{-\frac{1}{2}}\right\|\\ \\ \left\|{(\nabla J_{2}^*(y_{2}))}^{\mathit{T}}\mathcal{G}_{2}(y_{2})R_{2}^{-\frac{1}{2}}\right\|\\ \vdots\\ \left\|{(\nabla J_{N}^*(y_{N}))}^{\mathit{T}}\mathcal{G}_{N}(y_{N})R_{N}^{-\frac{1}{2}}\right\|\\ \end{array}\right] $$

Next, Equation (24) can be transformed to the following compact form:

(29)

$$ \begin{align} \dot{\mathcal{L}}(y) &\leq-\vartheta^{\mathit{T}} \left[\begin{array}{ccc:ccc} M\ \ & \ \ -\frac{1}{2}B^{\mathit{T}}M\\ -\frac{1}{2}MB\ \ & \ \ MK\\ \end{array}\right]\vartheta \\ &\triangleq -\vartheta^{\mathit{T}} \mathscr{A} \vartheta. \end{align} $$

According to Equation (29), it can be concluded that when $$ k_{i} $$ is sufficiently large, the matrix $$ \mathscr{A} $$ is positive definite, which means there exist $$ {k}_{i}^* $$ so that any $$ {k}_{i}>{k}_{i}^* $$ sufficiently large to ensure the positive definite property of $$ \mathscr{A} $$. Then, we get $$ \dot{\mathcal{L}}(y)<0 $$. Consequently, the DTC strategy with external disturbances is constructed. The proof is completed.

Obviously, the key point of designing the DTC strategy is to obtain the optimal controller of the ATIS based on Theorem 1. Next, for the sake of getting hold of optimal controllers for the $$ N $$ ATISs by solving the HJI equations, in the following, we employ the ADP method to obtain the approximate optimal solutions by means of critic networks.

4. OPTIMAL DTC DESIGN VIA NEURAL NETWORKS

4.1. Implementation procedure via neural networks

In this section, we show the process of finding the approximate optimal solution by employing the ADP method based on neural networks. The critic networks have the capability of approximating nonlinear mapping, and the approximate cost function can be derived for the DTC system. Hence, $$ J_{i}^*(y_{i}) $$ can be expressed as

(30)

$$ J_{i}^*(y_{i})=w^{\mathit{T}}_{ci}\sigma_{ci}(y_{i})+\xi_{ci}(y_{i}), $$

where $$ w_{ci}\in{\mathbb R}^{l_{ci}} $$ is the ideal weight vector, $$ l_{ci} $$ is the number of neurons in the hidden layer, $$ \sigma_{ci}(y_{i})\in{\mathbb R}^{l_{ci}} $$ is the activation function, and $$ \xi_{ci}(y_{i}) $$ is the reconstruction error of the $$ i $$th neural network. The gradient of $$ J_{i}^*(y_{i}) $$ is formulated as

(31)

$$ \nabla J_{i}^*(y_{i})=(\nabla \sigma_{ci}(y_{i}))^{\mathit{T}}w_{ci}+\nabla \xi_{ci}(y_{i}), $$

Considering Equation (16), the optimal control policy for the $$ i $$th ATIS is replaced by

(32)

$$ u^*_{i}(y_{i})=-\frac{1}{2}R_{i}^{-1}{\mathcal{G}}^{\mathit{T}}_{i}(y_{i}) \left((\nabla \sigma_{ci}(y_{i}))^{\mathit{T}}w_{ci}+\nabla \xi_{ci}(y_{i})\right). $$

Utilizing Equations (31) and (32), the Hamiltonian associated with the $$ i $$th ATIS is obtained as

(33)

$$ \begin{align} H_{i}(y_{i}, v_{i}(y_{i}), w_{ci})=&\ \gamma^{2}_{i}(y_{i})-\lambda_{i}(w_{ci}^{\mathit{T}}\sigma_{ci}(y_{i}))+{w}_{ci}^{\mathit{T}}(\nabla \sigma_{ci}(y_{i}))[\mathcal{F}_{i}(y_{i})+\mathcal{H}_{i}(y_{i})v_{i}(y_{i})]\\ &\, -\frac{1}{4}w^{\mathit{T}}_{ci}\nabla \sigma_{ci}(y_{i})\mathcal{G}_{i}(y_{i})R_{i}^{-1}\mathcal{G}^{\mathit{T}}_{i}(y_{i})(\nabla \sigma_{ci}(y_{i}))^{\mathit{T}}w_{ci}+e_{chi}=0, \end{align} $$

where $$ e_{chi} $$ is the residual error of the neural network. To avoid the unknown ideal weight vector, we construct $$ N $$ critic neural networks to approximate $$ J_{i}^*(y_{i}) $$ as

(34)

$$ \hat{J}_{i}^*(y_{i})=\hat{w}^{\mathit{T}}_{ci}\sigma_{ci}(y_{i}), $$

where $$ \hat{w}_{ci} $$ is the estimated weight. Likewise, the derivative of $$ \hat{J}_{i}^*(y_{i}) $$ is

(35)

$$ \nabla \hat{J}_{i}^*(y_{i})=(\nabla \sigma_{ci}(y_{i}))^{\mathit{T}}\hat{w}_{ci}. $$

Based on Equation (35), we obtain the estimated value of $$ u_{i}^*(y_{i}) $$ and $$ v_{i}^*(y_{i}) $$ as

(36)

$$ \hat u_{i}^*(y_{i})=-\frac{1}{2}R_{i}^{-1}{\mathcal{G}}^{\mathit{T}}_{i}(y_{i})(\nabla \sigma_{ci}(y_{i}))^{\mathit{T}}\hat{w}_{ci}, $$

(37)

$$ \hat v_{i}^*(y_{i})=\frac{1}{2{\varrho^2_{i}}}{\mathcal{H}}^{\mathit{T}}_{i}(y_{i})(\nabla \sigma_{ci}(y_{i}))^{\mathit{T}}\hat{w}_{ci}. $$

Considering Equations (34-36), the approximate Hamiltonian is expressed as

(38)

$$ \begin{align} \hat{H}_{i}(y_{i}, \hat {v}^*_{i}(y_{i}), \hat{w}_{ci})=&\ \gamma^{2}_{i}(y_{i})-\lambda_{i}(\hat{w}^{\mathit{T}}_{ci}\sigma_{ci}(y_{i}))+ {\hat{w}_{ci}}^{\mathit{T}}(\nabla \sigma_{ci}(y_{i})) [\mathcal{F}_{i}(y_{i})+\mathcal{H}_{i}(y_{i})\hat {v}^*_{i}(y_{i})]\\ &\, -\frac{1}{4}\hat{w}^{\mathit{T}}_{ci}\nabla \sigma_{ci}(y_{i})\mathcal{G}_{i}(y_{i})R_{i}^{-1}\mathcal{G}^{\mathit{T}}_{i}(y_{i})(\nabla \sigma_{ci}(y_{i}))^{\mathit{T}}\hat{w}_{ci} =\ e_{ci}. \end{align} $$

Then, we obtain an error function of the Hamiltonian, which is denoted as $$ e_{ci} $$ and is expressed by

(39)

$$ \begin{align} e_{ci}=\ &\ \hat{H}_{i}(y_{i}, \hat {v}^*_{i}(y_{i}), \hat{w}_{ci})-H_{i}(y_{i}, v_{i}(y_{i}), w_{ci})\\ =\ &\ \lambda_{i}(\tilde{w}^{\mathit{T}}_{ci}\sigma_{ci}(y_{i})) -{\tilde{w}_{ci}}^{\mathit{T}}(\nabla \sigma_{ci}(y_{i})) [\mathcal{F}_{i}(y_{i})+\mathcal{H}_{i}(y_{i})v_{i}(y_{i})]\\ &-\frac{1}{4}\tilde{w}^{\mathit{T}}_{ci}\nabla \sigma_{ci}(y_{i})\mathcal{G}_{i}(y_{i})R_{i}^{-1}\mathcal{G}^{\mathit{T}}_{i}(y_{i})(\nabla \sigma_{ci}(y_{i}))^{\mathit{T}}\tilde{w}_{ci}\\ &+\frac{1}{2}w^{\mathit{T}}_{ci}\nabla \sigma_{ci}(y_{i})\mathcal{G}_{i}(y_{i})R_{i}^{-1}\mathcal{G}^{\mathit{T}}_{i}(y_{i})(\nabla \sigma_{ci}(y_{i}))^{\mathit{T}}\tilde{w}_{ci}-e_{chi}, \end{align} $$

where $$ \tilde{w}_{ci}=w_{ci}-\hat{w}_{ci} $$ is the weight error vector. At present, in order to minimize the objective function $$ E_{ci}=({1}/{2})e^{\mathit{T}}_{ci}e_{ci} $$, the normalised steepest descent algorithm based on Equation (38) is employed as follows:

(40)

$$ \dot{\hat{w}}_{ci}=\ -\eta_{ci}\frac{1}{(1+\phi^{\mathit{T}}_{i}\phi_{i})^2}\left(\frac{\partial{E_{ci}}}{\partial{\hat{w}_{ci}}}\right) =\ -\eta_{ci}\frac{\phi_{i}}{(1+\phi^{\mathit{T}}_{i}\phi_{i})^2}e_{ci}, $$

where $$ \eta_{ci}>0 $$ represents the basic learning rate. Besides, $$ (1+\phi^{\mathit{T}}_{i}\phi_{i})^2 $$ is introduced for the normalization to simplify the critic error dynamics, and $$ \phi_{i} $$ is derived as

(41)

$$ \phi_{i}=\ \nabla \sigma_{ci}(y_{i})[\mathcal{F}_{i}(y_{i})+\mathcal{H}_{i}(y_{i})\hat {v}^*_{i}(y_{i})]-\lambda_{i}\sigma_{ci}(y_{i}). $$

Usually, in the traditional weight training process, it is often necessary to select the appropriate initial weight vector for effective training. To eliminate the initial admissible control law, an improved critic learning rule is presented in the following.

4.2. Improved critic learning rule via neural networks

Herein, an additional Lyapunov function is introduced for the purpose of improving the critic learning mechanism. Then, the following rational assumption is given.

Assumption 1 Consider the dynamic of the $$ i $$th ATIS Equation (9) with the optimal cost function Equation (14) and the closed-loop optimal control policy Equation (32). We select $$ J_{si}(y_{i}) $$ as a continuously differentiable Lyapunov function and have the following relation:

(42)

$$ \begin{align} \dot{J}_{si}(y_{i})=(\nabla J_{si}(y_{i}))^{\mathit{T}}[\mathcal{F}_{i}(y_{i})+\mathcal{G}_{i}(y_{i}){u}_{i}^*(y_{i}) +\mathcal{H}_{i}(y_{i})v_{i}^*(y_{i})]<0. \end{align} $$

In other words, there exists a positive definite matrix $$ \mathscr{B} $$ such that

(43)

$$ \begin{align} (\nabla J_{si}(y_{i}))^{\mathit{T}}[\mathcal{F}_{i}(y_{i})+\mathcal{G}_{i}(y_{i}){u}_{i}^*(y_{i}) +\mathcal{H}_{i}(y_{i})v_{i}^*(y_{i})] =-(\nabla J_{si}(y_{i}))^{\mathit{T}}\mathscr{B}\nabla J_{si}(y_{i})\leq -s_{mi}\|\nabla J_{si}(y_{i})\|^2, \end{align} $$

where $$ s_{mi} $$ is the minimum eigenvalue of the matrix $$ \mathscr{B} $$.

Remark 2.Herein, the motivation of selecting the cost function $$ J_{si}(y_{i}) $$ is to obtain the optimal DTC strategy, which can minimize and maximize $$ J_{si}(y_{i}) $$ under the optimal control law and the worst disturbance law. Moreover, we can discuss the stability of closed-loop systems by the constructed optimal cost function. Besides, just to be clear, $$ J_{si}(y_{i}) $$ is derived by properly selecting the quadratic polynomial in terms of the state vector. We generally choose $$ J_{si}(y_{i})=0.5{y_{i}}^{\mathit{T}}y_{i} $$.

When the condition occurs, that is, $$ (\nabla J_{si}(y_{i}))^{\mathit{T}}[\mathcal{F}_{i}(y_{i})+\mathcal{G}_{i}(y_{i}){u}_{i}^*(y_{i}) +\mathcal{H}_{i}(y_{i})v_{i}^*(y_{i})] $$$$ >0 $$, which means the system is in an unstable state under the optimal control law Equation (36). In this case, an additional term is introduced to ensure the system stability. Based on Equation (36), some processing is performed as follows:

(44)

$$ \begin{align} &\frac{-\partial[(\nabla J_{si}(y_{i}))^{\mathit{T}}(\mathcal{F}_{i}(y_{i})+\mathcal{G}_{i}(y_{i}){u}_{i}^*(y_{i}) +\mathcal{H}_{i}(y_{i})v_{i}^*(y_{i}))]}{\partial \hat{w}_{ci}}\\ &=\bigg(\frac{\partial \hat u_{i}^*(y_{i})}{\partial \hat{w}_{ci}}\bigg)^{\mathit{T}}\frac{-\partial[(\nabla J_{si}(y_{i}))^{\mathit{T}}(\mathcal{F}_{i}(y_{i})+\mathcal{G}_{i}(y_{i}){u}_{i}^*(y_{i}) +\mathcal{H}_{i}(y_{i})v_{i}^*(y_{i}))]}{\partial \hat u_{i}^*(y_{i})} \\ &=\frac{1}{2}\nabla \sigma_{ci}(y_{i})\mathcal{G}_{i}(y_{i})R_{i}^{-1}{\mathcal{G}}^{\mathit{T}}_{i}(y_{i}) \nabla J_{si}(y_{i}). \end{align} $$

Thus, we describe the improved learning rule as

(45)

$$ \dot{\hat{w}}_{ci}= -\eta_{ci}\frac{\phi_{i}}{(1+\phi^{\mathit{T}}_{i}\phi_{i})^2}e_{ci}+\frac{1}{2}\eta_{si}\Pi_{i}(y_{i}, \hat{u}_{i}^*, \hat{v}_{i}^*)\nabla \sigma_{ci}(y_{i})\mathcal{G}_{i}(y_{i})R_{i}^{-1}{\mathcal{G}}^{\mathit{T}}_{i}(y_{i}) \nabla J_{si}(y_{i}), $$

where $$ \eta_{si}>0 $$ represents the additional learning rate with respect to the stabilising term and $$ \Pi_{i}(y_{i}, \hat{u}_{i}^*, \hat{v}_{i}^*) $$ stands for the adaptation parameter term that tests the stability of the ATIS. The definition of $$ \Pi_{i} $$ is as follows:

(46)

$$ \Pi_{i}(y_{i}, \hat{u}_{i}^*, \hat{v}_{i}^*)=\left\lbrace \begin{array}{cl} 0, & \text{if}\ \dot{J}_{si}^*(y_{i})<0, \\ 1, & \text{else}.\end{array}\right. $$

It is found that when the derivative of $$ J_{si}(y_{i}) $$ satisfies $$ \dot{J}_{si}(y_{i})<0 $$, the latter term of the weight update rule does not play its role so that the update mode is still the traditional normalized steepest descent algorithm. When $$ \dot{J}_{si}(y_{i})>0 $$, the latter term of the weight update rule starts to play its role of ensuring the stability, that is, the improved weight update method is adopted. It can be seen that the system can be adjusted to be stable under the improved weight updating criterion. Moreover, in order to clearly highlight that we have achieved the elimination of the initial admissible control law, herein, we set the initial weight vector to zero. Through the new critic learning rule, the structure of the proposed DTC strategy for ATIS is performed in Figure 1.

Decentralized tracking control design based on intelligent critic for an interconnected spring-mass-damper system

Figure 1. Control structure of the ATIS. ATIS: augmented tracking isolated subsystem。

In accordance to $$ \dot{\tilde{w}}_{ci}=-\dot{\hat{w}}_{ci} $$ and Equation (39), the specific form of $$ {\tilde{w}}_{ci} $$ is derived. Then, we can convert the estimated weight $$ \hat{w}_{ci} $$ into the form of the weight vector $$ {w}_{ci} $$ and the error weight vector $$ \tilde{w}_{ci} $$, which can be employed by proving the state $$ y_{i} $$ and the weight estimation error $$ \tilde{w}_{ci} $$ are UUB for the closed-loop system.

5. SIMULATION EXPERIMENT

In this section, we will introduce the common mechanical vibration system, that is, the spring-mass-damper system. The structural diagram of the mechanical system is shown in Figure 2. From it, $$ M_{1} $$ and $$ M_{2} $$ denote the mass of two objects, $$ K_{1} $$, $$ K_{2} $$, and $$ K_{3} $$ represent the stiffness constants of three springs. $$ C_{1} $$, $$ C_{2} $$, and $$ C_{3} $$ stand for the damping, respectively.

Figure 2. Simple diagram of the interconnected mass–spring–damper system.

In addition, let $$ P_{i} $$, $$ V_{i} $$, $$ F_{i} $$, and $$ f_{\mu_{i}} $$ be the position, the velocity, the force, and the friction applied to the object, where $$ i=1, 2 $$. Hence, the system dynamics for $$ M_{1} $$ and $$ M_{2} $$ are as follows:

(47)

$$ \dot{P}_{1}=V_{1}, $$

(48)

$$ M_{1} \dot{V}_{1}=-K_{1}P_{1}-C_{1}V_{1}+K_{2}\left(P_{2}-P_{1}\right)+C_{2}\left(V_{2}-V_{1}\right)+F_{1}-f_{\mu_{1}}, $$

(49)

$$ \dot{P}_{2}=V_{2}, $$

(50)

$$ M_{2} \dot{V}_{2}=-K_{3} P_{2}-C_{3} V_{2}+K_{2}\left(P_{1}-P_{2}\right)+C_{2}\left(V_{1}-V_{2}\right)+F_{2}-f_{\mu_{2}}. $$

For the object $$ M_{1} $$, we define $$ x_{11}=P_{1} $$, $$ x_{12}=V_{1} $$, $$ \bar{u}_{1}(x_{1})=F_{1} $$, and $$ {v}_{1}(x_{1})=f_{\mu_{1}} $$. In the same way, we let $$ x_{21}=P_{2} $$, $$ x_{22}=V_{2} $$, $$ \bar{u}_{2}(x_{2})=F_{2} $$, and $$ {v}_{2}(x_{2})=f_{\mu_{2}} $$ for the object $$ M_{2} $$. Next, the state-space of the spring-mass-damper system is written as

(51)

$$ \dot{x}_{1}=\left[\begin{array}{l} \dot{x}_{11} \\ \dot{x}_{12} \end{array}\right]=\left[\begin{array}{c} x_{12} \\ -\frac{K_{1}}{M_{1}} x_{11}-\frac{C_{1}}{M_{1}} x_{12} \end{array}\right]+\left[\begin{array}{c} 0 \\ \frac{1}{M_{1}} \end{array}\right]\left(\bar{u}_{1}\left(x_{1}\right)+Z_{1}(x)\right)+\left[\begin{array}{c} 0 \\ -\frac{1}{M_{1}} \end{array}\right] v_{1}\left(x_{1}\right) $$

and

(52)

$$ \dot{x}_{2}=\left[\begin{array}{c} \dot{x}_{21} \\ \dot{x}_{22} \end{array}\right]=\left[\begin{array}{c} x_{22} \\ -\frac{K_{3}}{M_{2}} x_{21}-\frac{C_{3}}{M_{2}} x_{22} \end{array}\right]+\left[\begin{array}{c} 0 \\ \frac{1}{M_{2}} \end{array}\right]\left(\bar{u}_{2}\left(x_{2}\right)+Z_{2}(x)\right)+\left[\begin{array}{c} 0 \\ -\frac{1}{M_{2}} \end{array}\right] v_{2}\left(x_{2}\right), $$

where $$ x_{1}=[x_{11}, x_{12}]^{\mathit{T}}\in{\mathbb R}^{2} $$ and $$ x_{2}=[x_{21}, x_{22}]^{\mathit{T}}\in{\mathbb R}^{2} $$ are system states. $$ \bar{u}_{1}(x_{1})\in{\mathbb R} $$, $$ \bar{u}_{2}(x_{2})\in{\mathbb R} $$, $$ v_{1}(x_{1})\in{\mathbb R} $$, and $$ v_{2}(x_{2})\in{\mathbb R} $$ are control inputs and disturbance inputs of the subsystem 1 and the subsystem 2, respectively. Simultaneously, $$ Z_{1}(x)=K_{2}\left(x_{21}-x_{11}\right)+C_{2}\left(x_{22}-x_{12}\right) $$ and $$ Z_{2}(x)=K_{2}\left(x_{11}-x_{21}\right)+C_{2}\left(x_{12}-x_{22}\right) $$, which indicates the spring $$ K_{2} $$ and the damping $$ C_{2} $$ play a connecting role for two subsystems. Herein, we let $$ \theta_{1}(x_{1})= ||x_{1}|| $$ and $$ \theta_{2}(x_{2})= |x_{22}| $$. Besides, we choose $$ \beta_{11}=\beta_{12}=1 $$, $$ \beta_{21}=\beta_{22}=1/2 $$, and $$ {\mu_{1}}={\mu_{2}}=1 $$. Moreover, we select $$ \lambda_{1}=\lambda_{2}=0.6 $$, $$ \varrho_{1}=\varrho_{2}=1 $$, $$ R_{1}=R_{2}=2 $$, and $$ Q_{1}=Q_{2}=2I_{4} $$, where $$ I_{4} $$ is the four-dimensional identity matrix. Above all, the desired reference trajectories $$ r_{1} $$ and $$ r_{2} $$ for two subsystems are generated by the following command system:

(53)

$$ \dot{r}_{i}=\left[\begin{array}{c} \dot{r}_{i 1} \\ \dot{r}_{i 2} \end{array}\right]=\left[\begin{array}{c} -0.5 r_{i 1}-0.5 r_{i 2} \cos \left(r_{i 1}\right) \\ \sin \left(r_{i 1}\right)-0.5 r_{i 2} \end{array}\right], \quad i=1, 2, $$

where $$ r_{1}=[r_{11}, r_{12}]^{\mathit{T}}\in{\mathbb R}^{2} $$ and $$ r_{2}=[r_{21}, r_{22}]^{\mathit{T}}\in{\mathbb R}^{2} $$ are reference states. Then, we define the tracking errors as $$ e_{i1}=x_{i1}-r_{i1} $$ and $$ e_{i2}=x_{i2}-r_{i2} $$. Hence, the augmented state vector can be expressed as $$ y_{i}=[y_{i1}, y_{i2}, y_{i3}, y_{i4}]^{\mathit{T}}=[e_{i1}, e_{i2}, r_{i1}, r_{i2}]^{\mathit{T}} $$, $$ i=1, 2 $$. We set practical parameters as $$ M_{1}=1 $$kg, $$ K_{1}=3 $$N/m, and $$ C_{1}=0.5 $$Ns/m for the subsystem 1. Similarly, we let $$ M_{2}=2 $$kg, $$ K_{3}=5 $$N/m, and $$ C_{3}=1 $$Ns/m for the subsystem 2. Considering Equations (51-53), the augmented system dynamics $$ \dot{y}_{1} $$ and $$ \dot{y}_{2} $$ can be obtained in the following forms:

(54)

$$ \dot{y}_{1}=\left[\begin{array}{c} r_{12}+e_{12}+0.5 r_{11}+0.5 r_{12} \cos \left(r_{11}\right) \\ -3(r_{11}+e_{11})-0.5(r_{12}+e_{12})-\sin \left(r_{11}\right)+0.5 r_{12} \\ -0.5 r_{11}-0.5 r_{12} \cos \left(r_{11}\right) \\ \sin \left(r_{11}\right)-0.5 r_{12} \end{array}\right]+\left[\begin{array}{c} 0 \\ 1 \\ 0 \\ 0 \end{array}\right] \bar{u}_{1}\left(y_{1}\right)+\left[\begin{array}{c} 0 \\ -1 \\ 0 \\ 0 \end{array}\right] v_{1}\left(y_{1}\right) $$

and

(55)

$$ \dot{y}_{2}=\left[\begin{array}{c} r_{22}+e_{22}+0.5 r_{21}+0.5 r_{22} \cos \left(r_{21}\right) \\ -2.5 (r_{21}+e_{21})-0.5 (r_{22}+e_{22})-\sin \left(r_{21}\right)+0.5 r_{22} \\ -0.5 r_{21}-0.5 r_{22} \cos \left(r_{21}\right) \\ \sin \left(r_{21}\right)-0.5 r_{22} \end{array}\right]+\left[\begin{array}{c} 0 \\ 0.5 \\ 0 \\ 0 \end{array}\right] \bar{u}_{2}\left(y_{2}\right)+\left[\begin{array}{c} 0 \\ -0.5 \\ 0 \\ 0 \end{array}\right] v_{2}\left(y_{2}\right). $$

Based on the online ADP algorithm, two critic networks are constructed as follows:

(56)

$$ \begin{align} \hat{J}_{1}^*(y_{1})=& \hat{w}_{10}y_{11}^2+\hat{w}_{11}y_{11}y_{12}+\hat{w}_{12}y_{11}y_{13} +\hat{w}_{13}y_{11}y_{14}+\hat{w}_{14}y_{12}^2\\ &\, +\hat{w}_{15}y_{12}y_{13}+\hat{w}_{16}y_{12}y_{14} +\hat{w}_{17}y_{13}^2+\hat{w}_{18}y_{13}y_{14}+\hat{w}_{19}y_{14}^2 \end{align} $$

and

(57)

$$ \begin{align} \hat{J}_{2}^*(y_{2})=& \hat{w}_{20}y_{21}^2+\hat{w}_{21}y_{21}y_{22}+\hat{w}_{22}y_{21}y_{23} +\hat{w}_{23}y_{21}y_{24}+\hat{w}_{24}y_{22}^2\\ &\, +\hat{w}_{25}y_{22}y_{23}+\hat{w}_{26}y_{22}y_{24} +\hat{w}_{27}y_{23}^2 +\hat{w}_{28}y_{23}y_{24}+\hat{w}_{29}y_{24}^2. \end{align} $$

During the online learning process, we take basic learning rates and additional learning rates as $$ \eta_{c1}=0.01 $$, $$ \eta_{c2}=0.03 $$ as well as $$ \eta_{s1}=\eta_{s2}=0.01 $$. Let initial system states and reference states be $$ x_{10}=[1.5, 0]^{\mathit{T}} $$, $$ x_{20}=[1, -1]^{\mathit{T}} $$, and $$ r_{10}=r_{20}=[0.5, -0.5]^{\mathit{T}} $$, respectively. Therefore, initial states of the ATIS are $$ y_{10}=[1, 0.5, 0.5, -0.5]^{\mathit{T}} $$ and $$ y_{20}=[0.5, -0.5, 0.5, -0.5]^{\mathit{T}} $$.

Herein, two probing noises are added within the beginning 400 steps to keep the persistence of excitation condition of the ATIS. The weight convergence curves are shown in Figure 3. It can be seen that the weight has converged to a certain numerical value before turning off the excitation condition, which confirms the validity of the improved weight update algorithm. Form it, we find the initial weights are selected as zero, which indicates the initial admissible control is eliminated.

Figure 3. Weights convergence process of the critic network 1 and the critic network 2.

Next, in order to make the system achieve the purpose of the optimal tracking, feedback gains are selected as $$ k_{1}=k_{2}=1 $$. Then, the DTC strategy $$ \{k_{1}\hat u_{1}^*(y_{1}), k_{2}\hat u_{2}^*(y_{2}) \}$$ can be derived from the obtained weight vector for the spring-mass-damper interconnected system. In addition, the evolution curves are shown in Figure 4, which displays the tracking control inputs and disturbance inputs for the subsystem 1 and the subsystem 2. Then, the obtained DTC strategy is applied to the controlled system for 50 s, and its tracking error trajectory curves are displayed in Figure 5. It is obvious that the tracking error curves are eventually enforced to the origin. Taken together, this simulation result verifies the effectiveness of the proposed DTC strategy.

Figure 4. Tracking control inputs and disturbance inputs for subsystem 1 and subsystem 2.

Figure 5. Tracking error trajectories for subsystem 1 and subsystem 2.

6. CONCLUSION

In this paper, the optimal DTC strategy for CT nonlinear large-scale systems with external disturbances is proposed by employing the ADP algorithm. The approximate optimal control law of the ATISs can achieve the trajectory tracking goal. Then, the establishment of the DTC strategy is derived by adding the appropriate feedback gain, whose feasibility has been proved via the Lyapunov theory. Note that all the above-mentioned results are investigated by considering a cost function with the discount. Then, only a series of single critic networks are employed to solve HJI equations of $$ N $$ ATISs, so that we acquire the approximate optimal control law and the worst disturbance law. In addition, the stability term added in the weight updating process avoids the selection of the initial stable control policy. Furthermore, the simulation results are displayed for the spring-mass-damper system to indicate the validity of the proposed DTC method. In the future, we will utilize more advanced methods to deal with the DTC problem for nonaffine systems. Besides, we can also consider the unmatched interconnected relationship for the DTC problem, which is a considerable direction of improved research.

DECLARATIONS

Authors' contributions

Made significant contributions to the conception and experiments: Fan W, Wang D

Made significant contributions to the writing: Fan W, Wang D

Made substantial contributions to the revision and translation: Liu A, Wang D

Availability of data and materials

Not applicable

Financial support and sponsorship

This work was supported in part by the National Natural Science Foundation of China (No. 62222301; No. 61890930-5 and No. 62021003); in part by the National Key Research and Development Program of China (No. 2021ZD0112302; No. 2021ZD0112301 and No. 2018YFC1900800-5); and in part by the Beijing Natural Science Foundation (No. JQ19013).

Conflicts of interest

All authors declared that there are no conflicts of interest.

Ethical approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Copyright

REFERENCES

1. Saberi A. On optimality of decentralized control for a class of nonlinear interconnected systems. Automatica 1988;24:1101-4.

2. Mu CX, Sun CY, Wang D, Song AG, Qian CS. Decentralized adaptive optimal stabilization of nonlinear systems with matched interconnections. Soft Comput 2018;22:82705-15.

3. Mehraeen S, Jagannathan S. Decentralized optimal control of a class of interconnected nonlinear discrete-time systems by using online Hamilton-Jacobi-Bellman formulation. IEEE Trans Neural Netw 2011;22:111715-96.

4. Yang X, He HB. Adaptive dynamic programming for decentralized stabilization of uncertain nonlinear large-scale systems with mismatched interconnections. IEEE Trans Syst Man Cybern Syst 2020;50:82870-82.

5. Karimi HR. How to deal with the complexity in robotic systems? Complex Eng Syst 2022;2:15.

6. Xu Q, Yu C, Yuan X, Fu Z, Liu H. A distributed electricity energy trading strategy under energy shortage environment. Complex Eng Syst 2022;2:14.

7. Bian T, Jiang Y, Jiang ZP. Decentralized adaptive optimal control of large-scale systems with application to power systems. IEEE Trans, Ind Electron 2015;62:42439-47.

8. Liu DR, Wang D, Li HL. Decentralized stabilization for a class of continuous-time nonlinear interconnected systems using online learning optimal control approach. IEEE Trans Neural Netw Learn Syst 2014;25:2418-28.

9. Sun KK, Sui S, Tong SC. Fuzzy adaptive decentralized optimal control for strict feedback nonlinear large-scale systems. IEEE Trans, Cybern 2018;48:41326-39.

10. Wang XM, Feng ZG, Zhang GJ, Niu B, Yang D, et al. Adaptive decentralised control for large-scale non-linear non-strict-feedback interconnected systems with time-varying asymmetric output constraints and dead-zone inputs. IET Control Theory & Appl 2020;14:203417-27.

11. Wei QL, Liu DR, Lin Q, Song RZ. Discrete-time optimal control via local policy iteration adaptive dynamic programming. IEEE Trans, Cybern 2017;47:103367-79.

12. Wang D, Ren J, Ha MM, Qiao JF. System stability of learning-based linear optimal control with general discounted value iteration. IEEE, Trans Neural Netw Learn Syst 2022:Online ahead of print.

13. Wang D, Ha MM, Zhao MM. The intelligent critic framework for advanced optimal control. Artif Intell Rev 2022;55:11-22.

14. Wang D, Qiao JF, Cheng L. An approximate neuro-optimal solution of discounted guaranteed cost control design. IEEE Trans Cybern 2022;52:177-86.

15. Li YM, Liu YJ, Tong SC. Observer-based neuro-adaptive optimized control of strict-feedback nonlinear systems with state constraints. IEEE Trans Neural Netw Learn Syst 2022;33:73131-45.

16. Wang H, Yang CY, Liu XM, Zhou LN. Neural-network-based adaptive control of uncertain MIMO singularly perturbed systems with full-state constraints. IEEE Trans Neural Netw Learn Syst 2021; doi: 10.1109/TNNLS.2021.3123361.

17. Zhang H, Hong QQ, Yan HC, Yang FW, Guo G. Event-based distributed H-infinity filtering networks of 2-DOF quarter-car suspension systems. IEEE Trans Ind Inform 2017;13:1312-21.

18. Chen YG, Fei SM, Li YM. Robust stabilization for uncertain saturated time-delay systems: A distributed-delay-dependent polytopic approach. IEEE Trans Automat Contr 2017;62:73455-60.

19. Chen YG, Wang ZD. Local stabilization for discrete-time systems with distributed state delay and fast-varying input delay under actuator saturations. IEEE Trans Automat Contr 2021;66:31337-44.

20. Fu H, Chen X, Wu M. Distributed optimal observer design of networked systems via adaptive critic design. IEEE Trans Syst Man Cybern, Syst 2021;51:116976-85.

21. Narayanan V, Jagannathan S. Event-triggered distributed control of nonlinear interconnected systems using online reinforcement learning with exploration. IEEE Trans Cybern 2018;48:92510-9.

22. Wang D, Zhao MM, Ha MM, Qiao JF. Intelligent optimal tracking with application verifications via discounted generalized value iteration. Acta Automatica Sinica 2022;48:1182-93.

23. Zhang HG, Zhang K, Cai YL, Han J. Adaptive fuzzy fault-tolerant tracking control for partially unknown systems with actuator faults via integral reinforcement learning method. IEEE Trans Fuzzy Syst 2019;27:101986-98.

24. Modares H, Lewis FL. Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica 2014;50:71780-1792.

25. Ha MM, Wang D, Liu DR. Discounted iterative adaptive critic designs with novel stability analysis for tracking control. IEEE/CAA, Journal of Automat Sinica 2022;9:71262-1272.

26. Qu QX, Zhang HG, Feng T, Jiang H. Decentralized adaptive tracking control scheme for nonlinear large-scale interconnected systems via adaptive dynamic programming. Neurocomputing 2017;225:1-10.

27. Niu B, Liu JD, Wang D, Zhao XD, Wang HQ. Adaptive decentralized asymptotic tracking control for large-scale nonlinear systems with unknown strong interconnections. IEEE/CAA Journal of Automatica Sinica 2022;9:1173-86.

28. Liu JD, Niu B, Kao YG, Zhao P, Yang D. Decentralized adaptive command filtered neural tracking control of large-scale nonlinear systems: An almost fast finite-time framework. IEEE Trans Neural Netw Learn Syst 2021;32:83621-2.

29. Tong SC, Zhang LL, Li YM. Observed-based adaptive fuzzy decentralized tracking control for switched uncertain nonlinear large-scale systems with dead zones. IEEE Trans Syst Man Cybern Syst 2016;46:137-47.

30. Wang D, Hu LZ, Zhao MM, Qiao JF. Dual event-triggered constrained control through adaptive critic for discrete-time zero-sum games. IEEE Trans Syst Man Cybern Syst 2023;53:31584-9.

31. Li XM, Zhang B, Li PS, Zhou Q, Lu RQ. Finite-horizon H-infinity state estimation for periodic neural networks over fading channels. IEEE Trans Neural Netw Learn Syst 2020;31:51450-60.

32. Duan JJ, Xu H, Liu WX, Peng JC, Jiang H. Zero-sum game based cooperative control for onboard pulsed power load accommodation. IEEE Trans Ind Inform 2020;16:1238-47.

33. Wang D, Zhao MM, Ha MM, Qiao JF. Stability and admissibility analysis for zero-sum games under general value iteration formulation. IEEE Trans Neural Netw Learn Syst 2022; doi: 10.1109/TNNLS.2022.3152268.

34. Zhang HG, Cui XH, Luo YH, Jiang H. Finite-horizon H-infinity tracking control for unknown nonlinear systems with saturating actuators. IEEE Trans Neural Netw Learn Syst 2018;29:41200-12.

35. Modares H, Lewis FL, Jiang ZP. H-infinity tracking control of completely unknown continuous-time systems via off-policy reinforcement learning. IEEE Trans Neural Netw Learn Syst 2015;26:102550-62.

36. Hou JX, Wang D, Liu DR, Zhang Y. Model-free H-infinity optimal tracking control of constrained nonlinear systems via an iterative adaptive learning algorithm. IEEE Trans Syst Man Cybern Syst 2020;50:114097-108.

Cite This Article

Research Article

Open Access

Decentralized tracking control design based on intelligent critic for an interconnected spring-mass-damper system

Wenqian Fan, ... Ding Wang

How to Cite

Download Citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click on download.

Export Citation File:

RIS BibTeX EndNote

Type of Import

Direct Import Indirect Import

Tips on Downloading Citation

This feature enables you to download the bibliographic information (also called citation data, header data, or metadata) for the articles on our site.

Citation Manager File Format

Use the radio buttons to choose how to format the bibliographic data you're harvesting. Several citation manager formats are available, including EndNote and BibTex.

Type of Import

If you have citation management software installed on your computer your Web browser should be able to import metadata directly into your reference database.

Direct Import: When the Direct Import option is selected (the default state), a dialogue box will give you the option to Save or Open the downloaded citation data. Choosing Open will either launch your citation manager or give you a choice of applications with which to use the metadata. The Save option saves the file locally for later use.

Indirect Import: When the Indirect Import option is selected, the metadata is displayed and may be copied and pasted as needed.

About This Article

Copyright

© The Author(s) 2023. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, sharing, adaptation, distribution and reproduction in any medium or format, for any purpose, even commercially, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Data & Comments

Data

Views

1662

Downloads

1111

Citations

2

Comments

0

1

Comments

Comments must be written in English. Spam, offensive content, impersonation, and private information will not be permitted. If any comment is reported and identified as inappropriate content by OAE staff, the comment will be removed without notice. If you have any queries or need any help, please contact us at [email protected].