Continual online learning-based optimal tracking control of nonlinear strict-feedback systems: application to unmanned aerial vehicles

Irfan Ganie; Sarangapani Jagannathan

doi:10.20517/ces.2023.35

Download PDF

Research Article | Open Access | 19 Feb 2024

Continual online learning-based optimal tracking control of nonlinear strict-feedback systems: application to unmanned aerial vehicles

Views: 628 | Downloads: 540 | Cited:

4

Irfan Ganie

,

Sarangapani Jagannathan

Complex Eng Syst 2024;4:4.

10.20517/ces.2023.35 | © The Author(s) 2024.

Author Information

Article Notes

Cite This Article

Abstract

A novel optimal trajectory tracking scheme is introduced for nonlinear continuous-time systems in strict feedback form with uncertain dynamics by using neural networks (NNs). The method employs an actor-critic-based NN backstepping technique for minimizing a discounted value function along with an identifier to approximate unknown system dynamics that are expressed in augmented form. Novel online weight update laws for the actor and critic NNs are derived by using both the NN identifier and Hamilton-Jacobi-Bellman residual error. A new continual lifelong learning technique utilizing the Fisher Information Matrix via Hamilton-Jacobi-Bellman residual error is introduced to obtain the significance of weights in an online mode to overcome the issue of catastrophic forgetting for NNs, and closed-loop stability is analyzed and demonstrated. The effectiveness of the proposed method is shown in simulation by contrasting the proposed with a recent method from the literature on an underactuated unmanned aerial vehicle, covering both its translational and attitude dynamics.

Keywords

Continual lifelong learning, optimal control, neural networks, unmanned aerial vehicles, strict-feedback systems

Download PDF 0 6

1. INTRODUCTION

Optimal control of nonlinear dynamical systems with known and uncertain dynamics is an important field of study due to numerous practical applications. Traditional optimal control methods^[1,2] for nonlinear continuous-time (CT) systems with known dynamics often require the solution to a partial differential equation, referred to as Hamilton-Jacobi-Bellman (HJB) equation, which cannot be solved analytically. To address this challenge, actor-critic designs (ACDs) combined with approximate dynamic programming (ADP) have been proposed as an online method^[3,4]. Numerous optimal adaptive control (OAC) techniques for nonlinear CT systems using strict-feedback structure have emerged, leveraging backstepping design as outlined in^[5,6]. These approaches, however, require predefined knowledge of the system dynamics. In real-world industrial settings, where system dynamics might be partially or completely unknown, the application of neural network (NN)-based optimal tracking for uncertain nonlinear CT systems in strict feedback form has been demonstrated in^[5,7], utilizing the policy/value iterations associated with ADP. However, these policy/value iteration methods often require an extensive number of iterations within each sampling period to solve the HJB equation and ascertain the optimal control input, leading to a significant computational challenge.

The optimal trajectory tracking of nonlinear CT systems involves obtaining a time-varying feedforward term to ensure precise tracking and a feedback term to stabilize the system dynamics. Recent optimal tracking efforts^[7,8], have utilized a backstepping-based approach with completely known or partially unknown system dynamics, but the design of feedforward term while minimizing a cost function has not been addressed. Instead, a linear term is used to design the control input. A more recent study^[8,9] employed a positive function for obtaining simple weight update laws of the actor and critic NN, which also relaxes the persistency of excitation (PE) condition. However, finding such a function for the time-varying trajectory tracking problem of a nonlinear CT system will be challenging by using an explicit time-dependent value function and HJB equation at each stage of backstepping design since the Hamiltonian is nonzero along the optimal trajectory^[10]. In simplified and optimized backstepping control schemes were developed for a class of nonlinear strict feedback systems^[8,11,12]. These approaches are different from the one proposed in^[5]. However, they either require complete knowledge of the system dynamics or do not assume that the system dynamics are completely unknown.

Moreover, all control techniques rooted in NN-based learning, whether aimed at regulation or tracking, routinely face the issues of catastrophic forgetting^[13]. This is understood as the system's ability to lose previously acquired knowledge while assimilating new information^[13,14]. Continual lifelong learning (CLL) is conceived as the sustained ability of a nonlinear system to acquire, assimilate, and retain knowledge over prolonged periods without the interference of catastrophic forgetting. This concept is particularly critical when delving into the realm of online NN control strategies for nonlinear CT systems, as these systems are often tasked with navigating and managing complex processes within dynamic and varying environments and conditions. Nonetheless, the lifelong learning (LL) methodologies shown in^[13,15] operate in an offline mode and have not been applied to real-time NN control scenarios yet. This scenario offers a promising direction to leverage the advantage of LL in online control systems, addressing catastrophic forgetting and thus enhancing the efficacy of the control system progressively. Implementing LL-oriented strategies in online NN control enables persistent learning and adaptation without discarding prior knowledge, thereby improving its overall performance. By developing an LL-based NN trajectory tracking scheme, it is possible to continuously learn and track trajectories of interest without losing information about previous tasks.

This paper presents an optimal backstepping control approach that incorporates reinforcement learning (RL) to design the controller. The proposed method utilizes an augmented system to address the tracking problem, incorporating both feedforward and feedback controls, which sets it apart from prior work such as^[8,16]. This approach uses a trajectory generator to generate the trajectories and hence deals with the non-stationary condition in the HJB equation that arises in optimal tracking problems due to the time-varying reference trajectory. In addition, the proposed weight update laws are direct error driven based, obtained using Hamiltonian and control input error, in contrast to where the weight update laws are obtained using some positive functions^[8,16]. Furthermore, the control scheme incorporates an identifier where the approximation error is bounded above by system states to approximate the unknown system dynamics, as opposed to prior work, such as^[8,16], where the system dynamics are either completely known or partially known. Additionally, the utilization of an HJB equation at each step of the backstepping process is intended to ensure that the entire sequence of steps is optimized.

The paper also examines the impacts of LL and catastrophic forgetting on control systems and proposes strategies for addressing these challenges in control system-based applications. Specifically, the proposed method employs a weight velocity attenuation (WVA)-based LL scheme in an online manner, in contrast to prior work, such as^[13,15], which utilizes offline learning. Additionally, the proposed method demonstrates the stability of the LL scheme via Lyapunov analysis in contrast to offline-based learning^[13,15], where the weight convergence is not addressed. To validate the effectiveness of the proposed method, an unmanned aerial vehicle (UAV) application is considered, and the proposed method is contrasted with the existing approach. Lyapunov stability shows the uniform ultimate boundedness (UUB) of the overall closed-loop continual lifelong RL (LRL) scheme.

The contributions include

(1) A novel optimal trajectory tracking control formulation is presented, utilizing an augmented system approach for nonlinear strict-feedback systems within an ADP-based framework, offering a novel perspective.

(2) An NN-based identifier is employed, wherein the reconstruction error is presumed to be upper-bounded by the norm of the state vector, providing an enhanced approximation of the system dynamics. The new weight update laws are introduced, incorporating Hamiltonian and the NN identifier within an actor-critic framework at each step of the backstepping process.

(3) An online LL method is developed in the critic NN weight update law, mitigating both catastrophic forgetting and gradient explosion, with the significance of weights for NN layers obtained using Fisher Information Matrix (FIM) determined by the Bellman error, as opposed to offline LL-based methods with targets.

(4) Lyapunov stability analysis is undertaken for the entire closed-loop tracking system, involving the identifier NN and the LL-based actor-critic NN framework to show the UUB of the closed-loop system.

2. CONTINUAL LIFELONG OPTIMAL CONTROL FORMULATION

In this section, we provide the problem formulation and the development of our proposed LRL approach for uncertain nonlinear CT systems in strict feedback form.

2.1. System description

Consider the following strict feedback system

(1)

$$ \begin{equation} \begin{split} &\dot{x}_{1}(t)=f_{1}\left({x}_{1}\right)+g_{1}({x}_{1})x_{2}, \\& \dot{x}_{2}(t)=f_{2}\left(\bar{x}_{2}\right)+g_{2}(\bar{x}_{2})u, \end{split} \end{equation} $$

where $$ \bar{x}_{2}(t)= $$$$ \left[x_{1}(t), x_{2}(t)\right]^{\top} \in \mathbb{R}^{2} $$ is the system state, $$ u \in \mathbb{R} $$ is the control input, and $$ f_{i}\left(.\right), g_{i}(.), i=1, 2 $$ are unknown yet Lipschitz continuous functions on $$ \Omega_{x} $$ and satisfying $$ f_i(0)=0 $$. The standard following assumptions are stated to proceed.

Assumption 1 (^[17]). The nonlinear CT system is controllable and observable. In addition, the control coefficient matrix satisfies $$ \|g_{i}(.)\| \leq g_{iM} $$, where $$ g_{iM} $$ is an unknown positive constant.

Assumption 2 (^[4]). The state vector is considered measurable. The desired trajectory $$ \xi_{r} \in \mathbb{R} $$ is bounded, and there exists a Lipschitz continuous command generator function $$ h_d\left(\xi_r(t)\right) $$ such that $$ \dot{\xi}_r(t)=h_d\left(\xi_r(t)\right) $$ and $$ h_d(0)=0 $$.

Next, the LRL control design is introduced. The goal of the LRL control scheme is to achieve satisfactory tracking and maintain the boundedness of all closed-loop system signals while minimizing the control effort and addressing the issue of catastrophic forgetting.

The design of the control system begins by implementing an optimal backstepping approach using augmented system-based actor-critic architecture and then using an online LL to mitigate catastrophic forgetting.

2.2. Optimal backstepping control

To develop optimal control using the backstepping technique, first, a new augmented system is expressed in terms of tracking error as follows. Define the tracking error as $$ e_{tr1}(t)=x_{1}(t)-\xi_{r}(t) $$. Taking the time derivative and using the value of $$ \dot{x}_{1} $$ to obtain

(2)

$$ \begin{equation} \dot{e}_{tr1}(t)={f}_{1}({x}_{1})+{g}_{1}({x}_{1})x_{2}-h_d(\xi_{r}(t)), \end{equation} $$

where $$ x_{2}(t) $$ is the virtual control.

In order to get both the feedforward and feedback part of the controller, the tracking problem is changed to a regulation problem by defining a new augmented state as $$ z_{1}=\left[\begin{array}{ll}e_{tr1}^{\top}& \xi^{\top}_{r}\end{array}\right]^{\top} $$. Then, we can write

(3)

$$ \begin{equation} \dot{z}_{1}(t)=\mathcal{F}_{s1}({z}_{1})+\mathcal{G}_{s1}({z}_{1})\alpha_{1}, \end{equation} $$

where $$ \mathcal{F}_{s1}({z}_{1})=\bigg[\begin{array}{c} f_{1}\left({x}_{1}\right)-h_d(\xi_{r}(t)) \\ h_d(\xi_{r}(t))\end{array}\bigg] $$ and $$ \mathcal{G}_{s1}({z}_{1})=\bigg[\begin{array}{c}g_{1}({x}_{1}) \\ 0\end{array}\bigg] $$, with $$ z_{1}(0)=\left[\begin{array}{ll}e_{tr1}^{\top}(0) & \xi^{\top}_{r}(0)\end{array}\right]^{\top} $$ and $$ \alpha_{1} $$ is virtual control. From Assumption 1, it follows that $$ 0<\|\mathcal{G}_{si}(.)\| \leq \bar{G}_{i} $$, where $$ \bar{G}_{i} >0 $$ for $$ i=1, 2 $$. The design is presented in two steps.

Step 1: For the first backstepping step, let $$ \alpha_1 $$ and $$ \alpha_1^* $$ be the virtual and optimal virtual control inputs, respectively. The optimal performance index function $$ J_1^*(z_1) $$ is defined as

(4)

$$ \begin{equation} \begin{aligned} J_{1}^{*}\left(z_{1}\right) =\min _{\alpha_{1} \in \Psi(\Omega_{z1})}\left(\int_{t}^{\infty} e^{-\gamma_{1}(s-t)} h_{1}\left(z_{1}, \alpha_{1}\right) d s\right) =\int_{t}^{\infty} e^{-\gamma_{1}(s-t)} h_{1}\left(z_{1}, \alpha^{*}_{1}\right) d s, \end{aligned} \end{equation} $$

where $$ \Psi\left(\Omega_{z1}\right) $$ denotes the set of admissible control policies over a compact set $$ \Omega_{z1} $$, $$ {J}_{1}(z_{1})=\left(\int_{t}^{\infty} e^{-\gamma_{1}(s-t)} {h}_{1}\left(z_{1}, \alpha_{1}\right) d s\right) $$, and $$ h_1(z_1, \alpha_1) = z_1(s)^{\top}q_{1}z_1(s) + \alpha_1(z_{1})^{\top}r_{1}\alpha_1(z_1) $$ with, $$ q_1 $$ is a positive definite, $$ r_1>0 $$, $$ \gamma_{1} $$ is the discount factor, $$ \gamma_{1}>0 $$.

Remark 1Generally, addressing trajectory tracking control problems poses considerable challenges, particularly when dealing with a system characterized by nonlinear dynamics and a trajectory that evolves over time. In such instances, a prevalent approach is to employ a discounted cost function, denoted as (4), to render the cost index, $$ {J}^{*}_{1} $$, finite. The control input, consisting of both feedforward and feedback components, is obtained simultaneously by minimizing the performance function (4) along the trajectories of the augmented system (3). In addition, the performance function is not explicitly dependent on time.

By taking the time derivative on both sides of the optimal performance function (4), the tracking Bellman equation is obtained as

(5)

$$ \begin{equation} \dot{J}^{*}_{1}(z_{1})=\int_{t}^{\infty}\frac{\partial}{\partial t}e^{-\gamma_{1}(s-t)} h_{1}\left(z_{1}, \alpha^{*}_{1}\right)) d s-h_{1}(z_{1}, \alpha^{*}_{1}) \end{equation} $$

By noting that the first term of (5) is $$ \gamma_{1}J_{1} $$, therefore, (5) can be rewritten as

(6)

$$ \begin{equation} \begin{split} & h_{1}\left(z_{1}, \alpha_{1}^{*}\right)+\frac{d J_{1}^{*\top}\left(z_{1}\right)}{d z_{1}} \dot{z}_{1}(t)-\gamma_{1} J_{1}^{*} = 0. \end{split} \end{equation} $$

Therefore, the tracking HJB equation is generated as

(7)

$$ \begin{equation} \begin{split} &H_{1}\left(z_{1}, \alpha_{1}^{*}, \frac{d J_{1}^{*}}{d z_{1}}\right)= h_{1}\left(z_{1}, \alpha_{1}^{*}\right)+\frac{d J_{1}^{*\top}\left(z_{1}\right)}{d z_{1}} \dot{z}_{1}(t)-\gamma_{1} J_{1}^{*}\\& = \frac{d J_{1}^{*\top}\left(z_{1}\right)}{d z_{1}} \left(\mathcal{F}_{s1}\left({{z}_{1}}\right)+\mathcal{G}_{s1}({z}_{1})\alpha_{1}^{*}\right)+Q_{1}+\bar{\alpha}_{1}-\gamma_{1} J_{1}^{*}=0 . \end{split} \end{equation} $$

where $$ H_{1}\left(z_{1}, \alpha_{1}^{*}, \frac{d J_{1}^{*}}{d z_{1}}\right) $$ is the Hamiltonian function for the first step, $$ Q_{1}=z_1(s)^{\top}q_{1}z_1(s), \bar{\alpha}_{1}=\alpha^{*}_1(z_{1})^{\top}r_{1}\alpha^{*}_1(z_1) $$. The optimal control solution $$ \alpha_1^* $$ can be obtained by solving $$ \frac{\partial H_1}{\partial \alpha_1^*}=0 $$. This equation represents the condition for finding the optimal control that minimizes the performance index and satisfies the HJB equation as

(8)

$$ \begin{equation} \alpha_{1}^{*}=-\frac{1}{2}r_{1}^{-1} \mathcal{G}^{\top}_{s1}({z}_{1})\frac{d J_{1}^{*}\left(z_{1}\right)}{d z_{1}} . \end{equation} $$

It is well known that NNs have universal function approximation abilities and can approximate a nonlinear continuous function $$ \mathcal{H}(z): \mathbb{R}^n \rightarrow \mathbb{R}^m $$ on a compact set $$ \Omega_z $$ as $$ \mathcal{H}(z) = W^{\top}\sigma(z), $$ where $$ W \in \mathbb{R}^{p \times m} $$ is the weight matrix, $$ p $$ is the number of neurons, and $$ \sigma(z) $$ is the basis function vector.

Since $$ J_{1}^{*} $$ is unknown, an NN and its derivative will be used to approximate it on a compact set as

(9)

$$ \begin{equation} \begin{split} &J_{1}^{*}=W_{c1}^{*\top}\sigma_{c1}({z}_{1})+\varepsilon_{c1}({z}_{1}) \end{split} \end{equation} $$

(10)

$$ \begin{equation} \begin{split} \frac{dJ_{1}^{*}(z_{1})}{d z_{1}}=\nabla\sigma^{\top}_{c1}({z}_{1}){W}^{*}_{c1}+\nabla\varepsilon^{\top}_{c1}({z}_{1}), \end{split} \end{equation} $$

where $$ W^{*}_{c1} $$, $$ \sigma_{c1} $$, and $$ \varepsilon_{c1} $$ are the target weights, basis function vector, and the NN reconstruction error, respectively, $$ \nabla\sigma_{c1} $$ and $$ \nabla\varepsilon_{c1} $$ are the partial derivative of $$ \sigma_{c1} $$ and $$ \varepsilon_{c1} $$, respectively, with respect to input $$ {z}_{1} $$. Substituting (8) and (9) into the HJB equation (6) to get

(11)

$$ \begin{equation} \begin{split} &H_{1}^{*}= \frac{1}{4}(W^{*\top}_{c1}\nabla\sigma_{c1}+\nabla\varepsilon_{c1})\wedge_{1}(\nabla\sigma^{\top}_{c1}W_{c1}^{*}+\nabla\varepsilon^{\top}_{c1})-\gamma_{1}(W_{c1}^{*\top}\sigma_{c1}({z}_{1})+\varepsilon_{c1})\\&+\left( {W}_{c1}^{*\top}\nabla\sigma_{c1}\left( {z}_{1}\right)+\nabla\varepsilon_{c1}\right) \times\left(\mathcal{F}_{s1}({z}_{1})-(\frac{1}{2} \wedge_{1}( \nabla\sigma^{\top}_{c1}\left( {z}_{1}\right){W}_{c1}^{*}+\nabla\varepsilon^{\top}_{c1})\right)+Q_{1}, \end{split} \end{equation} $$

where optimal Hamiltonian function $$ H_{1}^{*}=H_1\left(z_1, {\alpha}_1^{*}, \frac{d {J}_1^*}{d z_1}\right), $$ and $$ \wedge_{1}=\mathcal{G}_{s1}r_{1}^{-1}\mathcal{G}^{\top}_{s1} $$. Since the target weight matrix, $$ W^{*}_{c1} $$, is unknown, an actor-critic NN and its derivative will be designed to find the solution as follows

(12)

$$ \begin{equation} \begin{split} & \hat{J}_{1}=\hat{W}_{c 1}^{\top}(t) \sigma_{c1}\left({z}_{1}\right) \\& \frac{d\hat{J}_{1}(z_{1})}{d z_{1}}=\nabla\sigma^{\top}_{c1}({z}_{1})\hat{W}_{c1}, \end{split} \end{equation} $$

where $$ \hat{W}_{c1} $$ is the estimate of the weight matrices for critic NN, and $$ \sigma_{c1}({z}_{1}) $$ is the basis function vector. Next, using (12), we can rewrite (11) as

(13)

$$ \begin{equation} \begin{split} &\hat{H}_{1}= \frac{1}{4}\hat{W}_{c1}^{\top}\nabla\sigma_{c1}\hat{\wedge}_{1}\nabla\sigma_{c1}^{\top}\hat{W}_{c1}-\gamma_{1}\hat{W}_{c1}^{\top}\sigma_{c1}({z}_{1})+\left(\hat{W}^{\top}_{c1}\nabla\sigma_{c1}\right)\left(\hat{\mathcal{F}}_{s1}({z}_{1})-\left.(\frac{1}{2} \hat{\wedge}_{1} \nabla\sigma^{\top}_{c1}\hat{W}_{c 1} \right.)\right)+ Q_{1}, \end{split} \end{equation} $$

where the estimated Hamiltonian function $$ \hat{H}_{1}=H_1\left(z_1, \hat{\alpha}_1, \frac{d \hat{J}_1}{d z_1}\right), \hat{\wedge}_{1}=\hat{\mathcal{G}}_{s1}r_{1}^{-1}\hat{\mathcal{G}}^{\top}_{s1} $$ and $$ \hat{\mathcal{F}}_{s1}({z}_{1}), \hat{\mathcal{G}}_{s1} $$ are the estimate of augmented internal dynamics and input dynamics to be generated in the subsequent section by using an NN identifier. From (8), $$ \alpha_{1} $$ is unknown, continuous on a compact set $$ \Omega_{z1} $$, and an NN is used to approximate it over a compact set, as shown in the subsequent steps. We can write

(14)

$$ \begin{equation} -\frac{1}{2} r^{-1}_{1}\mathcal{G}^{\top}_{s1}({z}_{1})\frac{d J_{1}^{*}\left(z_{1}\right)}{d z_{1}} =-\frac{1}{2}\mathcal{V}_{1}^{*}({z}_{1}), \end{equation} $$

where $$ \mathcal{V}_{1}^{*}({z}_{1})=r^{-1}_{1}\mathcal{G}^{\top}_{s1}({z}_{1})\frac{d J_{1}^{*}\left(z_{1}\right)}{d z_{1}}. $$ where $$ \mathcal{G}_{s1}({z}_{1})\in \mathbb{R}^{2} $$, and $$ \frac{d J_{1}^{*}\left(z_{1}\right)}{d z_{1}} \in \mathbb{R}^{2} $$ which makes $$ \mathcal{V}_{1}^{*}({z}_{1}) $$ a scaler. To achieve the tracking, decompose the term $$ \mathcal{V}^{*}_{1} $$ into following two parts

(15)

$$ \begin{equation} \mathcal{V}^{*}_{1}=2\beta_{1}z_{1}+\mathcal{V}_{1}^{0}, \end{equation} $$

where $$ \mathcal{V}_{1}^{0}=-2\beta_{1}z_{1}+\mathcal{V}_{1}^{*}, $$ where $$ \beta_{1} $$ is a $$ [\bar{\beta}_{1}, 0] $$ i.e $$ \beta_{1}\in \mathbb{R}^{1\times2} $$, $$ \bar{\beta}_{1}>0 $$, $$ z_{1} \in \mathbb{R}^{2} $$. The optimal virtual controller $$ \alpha_{1}^{*} $$ can be written as

(16)

$$ \begin{equation} \alpha_{1}^{*}=-\beta_{1}z_{1}-\frac{1}{2}\mathcal{V}_{1}^{0}. \end{equation} $$

Therefore, $$ \mathcal{V}^{0}_{1} $$ can be approximated using NN,

(17)

$$ \begin{equation} \begin{split} &r^{-1}_{1}\mathcal{G}^{\top}_{s1}({z}_{1})\frac{d J_{1}^{*}\left(z_{1}\right)}{d z_{1}} =2\beta_{1}z_{1}+(W_{a1}^{* \top} \sigma_{a1}\left( {z}_{1}\right)+\varepsilon_{ 1}), \end{split} \end{equation} $$

where $$ W^*_{a1} \in \mathbb{R}^{p \times m} $$ are the target bounded unknown weights, and $$ \varepsilon_{1} \in \mathbb{R}^m $$ denotes the function reconstruction error; $$ \sigma_{a1}({z}_{1}) $$ with $$ {z}_{1} \in \Omega_{z_{1}} $$ is the basis function vector. Hence, the virtual controller (8) can be written as

(18)

$$ \begin{equation} \alpha_{1}^{*}=-\beta_{1} z_{1}-\frac{1}{2}( W_{a1}^{* \top} \sigma_{a1}\left( {z}_{1}\right)+\varepsilon_{1}\left({z}_{1}\right)), \end{equation} $$

where $$ W^*_{a1} \in \mathbb{R}^{p \times m} $$ are the target bounded unknown weights, and $$ \varepsilon_{1} \in \mathbb{R}^m $$ denotes the function reconstruction error; $$ \sigma_{a1}({z}_{1}) $$ with $$ {z}_{1} \in \Omega_{z_{1}} $$ is the basis function vector. Since the target weights are unknown, the actor NN will be designed to estimate the optimal virtual control as

(19)

$$ \begin{equation} \hat{\alpha}_{1}=-\beta_{1}z_{1}-\frac{1}{2} \hat{W}_{a 1}^{ \top} \sigma_{a1}\left({z}_{1}\right), \end{equation} $$

where $$ \hat{W}_{a1}, \sigma_{a1}({z}_{1}), {z}_{1}\in \Omega_{z1} $$ are the estimated weights and the basis function of actor NN. According to the HJB equation (11) and its approximation (13), define the HJB residual error $$ e_1(t) $$ as

(20)

$$ \begin{equation} \begin{split} &e_1(t)=\hat{H}_{1}-{H}^{*}_1=\hat{H}_{1}, \end{split} \end{equation} $$

since the optimal Hamiltonian value is zero. Notice that the estimated Hamiltonian, $$ \hat{H}_{1} $$, requires the estimate of unknown dynamics from the identifier. The second step is discussed next.

Remark 2The subsequent section will leverage the HJB residual error (20) to formulate the weight update laws for the critic NN. Additionally, the control input error $$ u_{e} $$, delineated in Lemma 1, will facilitate the derivation of the actor NN weight update laws.

Employing the HJB residual error and the control input error for formulating the weight update laws is pivotal as it enables efficient optimization of the weights in NN. This methodology ensures more accurate and reliable learning processes, allowing the network to better approximate the desired functions or policies, thereby enhancing the overall performance and robustness of the system.

Step$$ \boldsymbol{2} $$: This is the second step, and the actual controller $$ \hat{u} $$ will be derived. Define $$ e_{tr2}(t)=x_{2}(t)-\hat{\alpha}_{1} $$; then using (1), the error dynamics are written as

(21)

$$ \begin{equation} \dot{e}_{tr2}(t)=f_{2}(\bar{x}_{2})+g_{2}(\bar{x}_{2})u-\dot{\hat{\alpha}}_{1} . \end{equation} $$

Since $$ \hat{\alpha}_{1}=-\beta_{1}{z}_{1}-{\hat{W}}^{\top}_{a1}\sigma_{z} $$, on using the values from step 1, $$ \dot{\hat{\alpha}}_{1} $$ can be calculated by using a virtual generator $$ h_{d}(\hat{\alpha}_{1}) $$. Let $$ {z}_{2}=[{e}_{tr2}^{\top}, \hat{\alpha}^{\top}_{1}]^{\top} \in \mathbb{R}^{2} $$. One can write

(22)

$$ \begin{equation} \dot{z}_{2}(t)=\mathcal{F}_{s2}({z}_{2})+\mathcal{G}_{s2}({z}_{2})u, \end{equation} $$

where $$ \mathcal{F}_{s2}({z}_{2})=\bigg[\begin{array}{cc}f_{2}(\bar{x}_{2})-h_{d}(\hat{\alpha}_{1}) \\ h_{d}(\hat{\alpha}_{1})\end{array}\bigg] $$ and $$ \mathcal{G}_{s2}({z}_{2})=\bigg[\begin{array}{cc} {g}_{2}(\bar{x}_{2})\\ 0\end{array}\bigg] $$. Letting $$ u^{*} $$ be the optimal control, the optimal integral cost function is defined as

(23)

$$ \begin{equation} \begin{aligned} J_{2}^{*}\left(z_{2}\right) =\min _{u \in \Psi(\Omega_{z2})}\left(\int_{t}^{\infty} e^{-\gamma_{2}(s-t)} h_{2}\left(z_{2}(s), u\left(z_{2}\right)\right) d s\right) =\int_{t}^{\infty} e^{-\gamma_{2}(s-t)} h_{2}\left(z_{2}(s), u^{*}\left(z_{2}\right)\right) d s, \end{aligned} \end{equation} $$

where $$ h_{2}=z_2(s)^{\top}q_{2}z_2(s)+u^{*}(z_{2})^{\top}r_{2}u^{*}(z_2) $$ is the cost function. The user-defined penalties, $$ q_2 $$ and $$ r_2 $$, are positive definite and $$ \gamma_{2}>0 $$.

The HJB equation for step 2 is given by

(24)

$$ \begin{equation} \begin{aligned} &H_{2}\left(z_{2}, u^{*}, \frac{d J_{2}^{*}}{d z_{2}}\right)= Q_{2}+\bar{u}+\frac{d J_{2}^{*\top}}{d z_{2}} \left(\mathcal{F}_{s2}\left({{z}_{2}}\right)+\mathcal{G}_{s2}({z}_{2})u^{*}\right)-\gamma_{2}J_{2}^*=0 , \end{aligned} \end{equation} $$

where $$ H_{2}\left(z_{2}, u^{*}, \frac{d J_{2}^{*}}{d z_{2}}\right) $$ is the Hamiltonian function for the second step, $$ Q_{2}=z_2(s)^{\top}q_{2}z_2(s), \bar{u}=u^{*}(z_{2})^{\top}r_{2}u^{*}(z_2). $$ Similar to previous steps, solving $$ \left(\partial H_{2} / \partial u^{*}\right)=0 $$ yields

(25)

$$ \begin{equation} u^{*}=-\frac{1}{2} r^{-1}_{2}\mathcal{G}^{\top}_{s2}({z}_{2}) \frac{d J_{2}^{*}}{d z_{2}} . \end{equation} $$

Since $$ J_{2}^{*} $$ is unknown, an NN will be used to approximate it on a compact set given by

(26)

$$ \begin{equation} J_{2}^{*}=W_{c2}^{*\top}\sigma_{c2}({z}_{2})+\varepsilon_{c2}({z}_{2}). \end{equation} $$

Therefore, $$ \frac{dJ_{2}^{*}({z}_{2})}{d z_{2}}=\nabla\sigma^{\top}_{c2}({z}_{2}){W}^{*}_{c2}+\nabla\varepsilon^{\top}_{c2} $$, where $$ W_{c2}^{*}, \sigma_{c2}, \varepsilon_{c2} $$ are the NN weights, activation function, and the reconstruction error, respectively. Substituting (25) and (26) into the HJB equation (24) gets

(27)

$$ \begin{equation} \begin{split} &H^{*}_{2}= \frac{1}{4}(W^{*\top}_{c2}\nabla\sigma_{c2}+\nabla\varepsilon_{c2})\wedge_{2}(\nabla\sigma^{\top}_{c2}W^{*}_{c2}+\nabla\varepsilon^{\top}_{c2})+Q_{2}-\gamma_{2}(W_{c2}^{*\top}\sigma_{c2}({z}_{2})+\varepsilon_{c2})\\&+\left( {W}_{c2}^{*\top}\nabla\sigma_{c2}\left( {z}_{2}\right)+\nabla\varepsilon_{c2}\right) \left(\mathcal{F}_{s2}({z}_{2})-\frac{1}{2} \wedge_{2}( \nabla\sigma^{\top}_{c2}\left( {z}_{2}\right){W}_{c2}^{*}+\nabla\varepsilon^{\top}_{c2})\right), \end{split} \end{equation} $$

where $$ H^{*}_{2}=H_2\left(z_2, {\alpha}_2^{*}, \frac{d {J}_2^*}{d z_2}\right) $$ is the optimal Hamiltonian for the second step, $$ \wedge_{2}=\mathcal{G}_{s2}r_{2}^{-1}\mathcal{G}^{\top}_{s2} $$. Since the target weight matrix, $$ W^{*}_{c2} $$, is unknown, an actor-critic NN will be designed to find the solution as follows

(28)

$$ \begin{equation} \begin{split} & \hat{J}_{2}=\hat{W}_{c 2}^{\top}(t) \sigma_{c2}\left({z}_{2}\right) \\& \frac{d\hat{J}_{2}(z_{2})}{d z_{2}}=\nabla\sigma^{\top}_{c2}({z}_{2})\hat{W}_{c2}, \end{split} \end{equation} $$

where $$ \hat{W}_{c2} $$ is the estimate of the weight matrices for critic NN, and $$ \sigma_{c2}({z}_{2}), {z}_{2}\in \Omega_{z2} $$ is the basis function vector. Next, we can rewrite (27) as follows

(29)

$$ \begin{equation} \begin{split} \hat{H}_{2}= \frac{1}{4}\hat{W}_{c2}^{\top}\nabla\sigma_{c2}\hat{\wedge}_{2}\nabla\sigma_{c2}^{\top}\hat{W}_{c2}+ Q_{2}-\gamma_{2}\hat{W}_{c2}^{\top}\sigma_{c2}({z}_{2})+\left( \hat{W}^{\top}_{c2}\nabla\sigma_{c2}\right) \left(\hat{\mathcal{F}}_{s2}({z}_{2})-\left.(\frac{1}{2}\hat{\wedge}_{2} \nabla\sigma^{\top}_{c2}\hat{W}_{c 2}\right.)\right), \end{split} \end{equation} $$

where $$ \hat{H}_{2}=H_2\left(z_2, \hat{\alpha}_2, \frac{d \hat{J}_2}{d z_2}\right) $$ is the estimated Hamiltonian, $$ \hat{\wedge}_{2}=\hat{\mathcal{G}}_{s2}r_{2}^{-1}\hat{\mathcal{G}}^{\top}_{s2} $$ and $$ \hat{\mathcal{F}}_{s2}({z}_{2}), \hat{\mathcal{G}}_{s2} $$ are the estimate of augmented internal and input dynamics to be generated in the subsequent section by using the NN identifier.

Since $$ \mathcal{G}_{s2}({z}_{2}) $$ and $$ \frac{d J_{2}^{*}\left(z_{2}\right)}{d z_{2}} $$ are unknown, continuous on a compact set $$ \Omega_{z2} $$, an NN is used to approximate them over a compact set, we can write

(30)

$$ \begin{equation} -\frac{1}{2}r^{-1}_{2} \mathcal{G}^{\top}_{s2}({z}_{2})\frac{d J_{2}^{*}\left(z_{2}\right)}{d z_{2}} =-\frac{1}{2}\mathcal{V}_{2}^{*}({z}_{2}), \end{equation} $$

where $$ \mathcal{V}_{2}^{*}({z}_{2})=r^{-1}_{2}\mathcal{G}^{\top}_{s2}({z}_{2})\frac{d J_{2}^{*}\left(z_{2}\right)}{d z_{2}}. $$ To achieve tracking, decompose the term $$ \mathcal{V}^{*}_{2} $$ into following two parts

(31)

$$ \begin{equation} \mathcal{V}^{*}_{2}=2\beta_{2}z_{2}+\mathcal{V}_{2}^{0}, \end{equation} $$

where $$ \mathcal{V}_{2}^{0}=-2\beta_{2}z_{2}+\mathcal{V}_{2}^{*}, \beta_{2}=[\bar{\beta}_{2}, 0] $$ i.e $$ \beta_{2}\in \mathbb{R}^{1\times2} $$, $$ \bar{\beta}_{2}>0 $$, $$ z_{2} \in \mathbb{R}^{2} $$. The optimal controller $$ u^{*} $$ can be written as $$ u^{*}=-\beta_{2}z_{2}-\frac{1}{2}\mathcal{V}_{2}^{0}. $$ Therefore, we can write

(32)

$$ \begin{equation} \begin{split} &r^{-1}_{2}\frac{\mathcal{G}^{\top}_{s2}({z}_{2})}{2}\frac{d J_{2}^{*}\left(z_{2}\right)}{d z_{2}} =2\beta_{2}z_{2}+(W_{a2}^{* \top} \sigma\left( {z}_{2}\right)+\varepsilon_{ 2}), \end{split} \end{equation} $$

where $$ W^*_{a2} \in \mathbb{R}^{p \times m} $$ are the target bounded unknown weights, and $$ \varepsilon_{2} \in \mathbb{R}^m $$ denotes the function reconstruction error; $$ \sigma({z}_{2}) $$ with $$ {z}_{2} \in \Omega_{z_{2}} $$ is the basis function vector. Therefore, the actual controller (16) can be written as

(33)

$$ \begin{equation} u^{*}=-\beta_{2} z_{2}-\frac{1}{2}( W_{a2}^{* \top} \sigma_{a2}\left( {z}_{2}\right)+\varepsilon_{2}\left({z}_{2}\right)). \end{equation} $$

Similarly, the actor NN will be designed to estimate the optimal control as

(34)

$$ \begin{equation} \hat{u}=-\beta_{2}z_{2}-\frac{1}{2} \hat{W}_{a 2}^{ \top} \sigma_{a2}\left({z}_{2}\right), \end{equation} $$

where $$ \hat{W}_{a2}, \sigma_{a2}({z}_{2}), {z}_{2}\in \Omega_{z2} $$ are the estimated weights and the basis function of actor NN. According to the HJB equation (27) and its approximation (29), define the HJB residual error $$ e_2(t) $$ as follows

(35)

$$ \begin{equation} \begin{split} &e_2(t)=\hat{H}_2-H^{*}_2=\hat{H}_2, \end{split} \end{equation} $$

since the optimal Hamiltonian for the second step is zero, notice that the estimated Hamiltonian, $$ \hat{H}_{2} $$, requires the estimate of unknown dynamics from the identifier. Similar to the two steps, the proposed method can be extended to $$ n $$th order.

Remark 3The optimal control input can be obtained by utilizing the gradient of the optimal value function (26) and an NN identifier. As a result, the critic NN outlined in (26) may be utilized to determine the actor input without the need for an additional NN. However, for the purpose of simplifying the derivation of weight update rules and subsequent stability analysis, separate NNs are employed for the actor and critic.

Next, an NN identifier will be used to approximate the unknown dynamics given by (3) and (22).

2.3. NN identifier

A single-layer NN is used to approximate both the nonlinear functions $$ \mathcal{F}_{sj} $$ and $$ \mathcal{G}_{sj} $$ where $$ j=1, 2 $$, that govern the augmented dynamics^[17]. Then, by using $$ \hat{\mathcal{G}}_{sj} $$, which is obtained from the NN identifier, the estimated control policy is applied to the nonlinear system. The approximations are represented as $$ {\mathcal{F}_{sj}}(z_{j}) = {V_{F_j}}^{\top} \sigma_{F_j}(z_{j}) + \varepsilon_{F_j}(z_{j}) $$ and $$ \mathcal{G}_{sj}(z_{j}) = V_{G_j}^{\top} \sigma_{G_j}(z_{j}) + \varepsilon_{G_j}(z_{j}) $$, where $$ V_{F_j} $$ and $$ V_{G_j} $$ are the NN's weight matrices, $$ \sigma_{F_j}(z_{j}) $$ and $$ \sigma_{G_j}(z_{j}) $$ are the activation functions, and $$ \varepsilon_{F_j}(z_{j}) $$ and $$ \varepsilon_{G_j}(z_{j}) $$ are the NN's reconstruction errors respectively. The estimated values of the internal dynamics and control coefficient matrix of the augmented system are given by $$ \hat{\mathcal{F}}_{sj}(z_{j}) =\hat {V}_{F_j}^{\top} \sigma_{F_j}(z_{j}) $$ and $$ \hat{\mathcal{G}}_{sj}(z_{j}) = \hat{V}_{G_j}^{\top} \sigma_{G_j}(z_{j}) $$, where $$ \hat{V}_{F_j} $$ and $$ \hat{V}_{G_j} $$ are the estimated NN's weight matrices. Define

(36)

$$ \begin{equation} \dot{z_{j}}(t)=Z_{j}^{\top} \boldsymbol{\sigma}(\boldsymbol{\xi_{j}}) \bar{\alpha}_{j}+\varepsilon_I(z_{j}), \end{equation} $$

where $$ \bar{\alpha}_{j}= $$$$ \left[1, \alpha_{j}\right]^{\top}, $$ is the augmented control input, $$ \alpha_{j} $$ is given by (19), (34), $$ Z_{j}=\left[\begin{array}{llll}V_{F_j}^{\top} & V_{G_j}^{\top} \end{array}\right]^{\top} \in \mathbb{R}^{2 l \times n} $$ represents the augmented NN identifier weights, and $$ \boldsymbol{\sigma}(\boldsymbol{\xi_{1}})= $$$$ \operatorname{diag}\{ \sigma_{F_j}\left(z_{j}\right), \sigma_{G_j}(z_{j})\} $$ denotes the augmented activation function for the NN identifier. The reconstruction error of the NN identifier, denoted by $$ \varepsilon_{I}(z_{j}) $$, is defined as $$ \varepsilon_{I}(z_{j})= $$$$ (\varepsilon_{F_j}(z_{j})+\varepsilon_{G_j}(z_{j}) \alpha_{j}) $$. Next, the following assumption is stated.

Assumption 3 (^[17]). The NN identifier is of single-layer, and its reconstruction error is bounded above such that $$ \left\|\varepsilon_{I}(z_{1})\right\|^{2} \leq b_{0}\|z_{1}\|^{2} $$ and $$ \|Z_{1}\| \leq Z_{m_{1}} $$.

Remark 4Because $$ \varepsilon_{I}(z_{j}) $$ depends on input $$ \alpha_{j} $$ and the system state $$ z_{j}(t) $$, it is assumed to be bounded above by the norm of the state vector unlike^[18], where $$ \varepsilon_{I}(z_{j}) $$ is bounded by a constant value.

Define the dynamics of the NN identifier as

(37)

$$ \begin{equation} \dot{\hat{z}}_{j}(t)=\hat{Z}_{j}^{\top} \boldsymbol{\sigma}(\boldsymbol{\xi}_{j}) \bar{\alpha}_{j}+K(z_{j}-\hat{z}_{j}), \end{equation} $$

where $$ \hat{z}_{j}(t) $$ represents the estimated augmented state vector, $$ K $$ is a user defined constant gain matrix, and $$ \hat{Z}_{j}=\left[\begin{array}{llll}\hat{V}_{F_j}^{\top} & \hat{V}_{G_j}^{\top} \end{array}\right]^{\top} $$ represents the augmented NN identifier estimated weights. The state estimation error is defined as $$ e_{i_{j}}=z_{j}-\hat{z}_{j}. $$ The weight update law for the NN identifier is given by

(38)

$$ \begin{equation} \dot{\hat{Z}}_{j}=-\alpha_{vj} \hat{Z}_{j}+\sigma(\boldsymbol{\xi}_{j}) \bar{\alpha}_{j} e_{i_{j}}^{\top}, \end{equation} $$

where $$ \alpha_{vj}>0 $$ is a tuning parameter.

Remark 5The NN identifier weights are tuned by using both augmented state estimation error and the input vector. The boundedness of the control input is needed to show the convergence of the NN identifier if proof is shown separately, whereas this assumption is relaxed when the identifier is combined with the LRL control scheme, as shown in the next section.

Remark 6Since the control input error and the HJB error used to tune the actor-critic NN weights require the system dynamics which are uncertain, the NN identifier is used to approximate the unknown dynamics of an augmented system. The estimated values from the identifier are used in the actor-critic weight update laws to tune the NN weights, as shown in the subsequent section.

2.4. Actor critic NN weight tuning

In this section, the actor-critic weight update laws are obtained using the gradient descent method to the Hamiltonian-based performance function. The following Lemma is stated.

Lemma 1Consider a system (1), transformed system (3), NN identifier weight update laws (38), the update laws for the critic NN (12), (26) and actor NN (19), (34). They can be written as

(39)

$$ \begin{equation} \dot{\hat{W}}_{cj} = \gamma_{cj} \hat{\psi}_j(t) \left( e_j^{\top}(t) - \frac{1}{4} \hat{W}_{cj}^{\top} \nabla \sigma_{cj} \hat{\wedge}_j \nabla \sigma_{cj}^{\top} \hat{W}_{cj} - \hat{W}_{cj}^{\top} \nabla \sigma_{cj} \hat{\mathcal{F}}_{sj} \right) - \sigma_{1j} \hat{W}_{cj} \end{equation} $$

(40)

$$ \begin{equation} \dot{\hat{W}}_{aj}(t) = \beta_{aj} (S_{aj} u_{ej}^{\top}) - \sigma_{2j} \hat{W}_{aj} \end{equation} $$

where $$ j $$ denotes the number of steps in backstepping, $$ \gamma_{cj}=\frac{\beta_{c j}\hat{\psi}_{j}}{\|\hat{\psi}_j\|^2+1} $$, $$ \beta_{c j}, \beta_{j}, \beta_{a j}>0 $$ are the learning rates, $$ S_{aj}=\sigma_{aj}\left( {z}_{j}\right) $$, $$ S_{cj}=\sigma_{cj}\left( {z}_{j}\right) $$, and $$ \hat{\psi}_j(t)=\gamma_{j}S_{cj}-\nabla\sigma_{cj}(\hat{\mathcal{F}}_{sj}+\hat{\mathcal{G}}_{sj}\hat{\alpha}_{j}) $$, where $$ \sigma_{1j}>0, \sigma_{2j>0} $$ are the design constants, $$ u_{ej} $$ is the error between estimated control and actual control and is given by $$ u_{ej}=-\beta_{j}z_{j}-\frac{1}{2}\hat{W}^{\top}_{aj}\sigma_{aj}\left( {z}_{j}\right)+\frac{1}{2}r_{j}^{-1} \hat{\mathcal{G}}^{\top}_{sj}\frac{d \hat{J}_{j}}{d z_{j}} $$, $$ e_{j} $$ is the HJB error for each step given by (20), (35), $$ \hat{\wedge}_{j}=\hat{\mathcal{G}}_{sj}r_{j}^{-1}\hat{\mathcal{G}}^{\top}_{sj}, $$ and $$ \hat{\mathcal{F}}_{sj}, \hat{\mathcal{G}}_{sj} $$ are approximated by using NN identifier.

Proof: The weight update laws for critic NN in step $$ j $$ are obtained by defining the performance function as

(41)

$$ \begin{equation} E_{j}=\frac{1}{2}e_{j}^{2}. \end{equation} $$

By using the gradient descent algorithm, the weight update law can be obtained as

(42)

$$ \begin{equation} \dot{\hat{W}}_{c j}= -\frac{\beta_{c j}\hat{\psi}_{j}}{(1+\hat{\psi}_j^{\top}\hat{\psi}_j)} \frac{\partial E_j(t)}{\partial \hat{W}_{c j}}. \end{equation} $$

On simplifying (42), we will get the weight update law for critic NN, as shown in Lemma. The weight update law for actor NN is obtained by defining the performance function as

(43)

$$ \begin{equation} E_{aj}=\frac{1}{2}u_{ej}^{2}. \end{equation} $$

By using the gradient descent approach, the weight update law for an actor NN is obtained as

(44)

$$ \begin{equation} \dot{\hat{W}}_{a j}(t)= -\frac{\partial E_{aj}(t)}{\partial \hat{W}_{a j}}. \end{equation} $$

On further solving and adding the stabilization terms, we will get the weight update law shown in Lemma 1.

Remark 7. The weight update laws are obtained using the gradient descent method to the Hamiltonian-based performance function. The weight update equations for the critic and actor have additional terms to ensure stability and facilitate convergence proof. The last term, known as the sigma modification term, relaxes the PE condition needed to ensure weight convergence. It is important to note that the right-hand side terms in the weight update equation can be measured.

The following assumption is stated next.

Assumption 4 (^[4]). It is assumed that the ideal weights exist and are bounded over a compact set by an unknown positive constant, such that $$ \left\|W_{cj}^{*}\right\| \leq \bar{W}_{M1j} $$$$ , \left\|W_{aj}^{*}\right\| \leq \bar{W}_{M2j} $$ where $$ \bar{W}_{M1j}, \bar{W}_{M2j} $$ are unknown constant values. The basis function $$ \sigma(\cdot) $$, function reconstruction error $$ \varepsilon_j(\cdot) $$, and their derivatives with respect to their arguments are assumed to be bounded over a compact set, with unknown bounds as $$ \|\sigma\|\leq\bar{\sigma}, \|\dot{\sigma}\|\leq \bar{\bar{\sigma}}, \varepsilon_j(\cdot)\leq \varepsilon_{m}, \dot{\varepsilon}_j(\cdot)\leq \bar{\varepsilon}_{m} $$.

Next, the following theorem is stated.

Theorem 1Consider the nonlinear system in strict-feedback form defined by (1). By using the augmented system (3), consider the optimal virtual control (19) and actual control terms (34), along with the identifier, actor-critic updating laws (38), (39), and (40). Further, assume that the design parameters are selected as per the conditions stated and that Assumptions 1 through 4 hold. If the system input is PE and its initial value, $$ u_0 $$, is admissible, then tracking errors, augmented state $$ z_j $$, the actual state $$ x_{j} $$ and NN weight estimation errors, $$ \tilde{{Z}}_{j}=Z_{j}-\hat{Z}_{j} $$, $$ \tilde{{W}}_{c_j}=W_{cj}-\hat{W}_{cj} $$, and $$ \tilde{{W}}_{a_j}=W_{aj}-\hat{W}_{aj} $$ are guaranteed to be bounded. This, in turn, ensures that the actual system output tracks the desired trajectory and estimated/actual control inputs are bounded close to their optimal values.

Proof: See Appendix.

Remark 8In the proposed optimal backstepping technique, the RL/ADP is employed at every step to obtain the optimal virtual and actual control inputs. We have derived the backstepping for a two-step process; however, it can be implemented up to $$ n $$ steps using a similar procedure.

Remark 9The sigma modification term does serve to alleviate the PE condition and assists in the process of forgetting; however, it does not prove effective in multitasking scenarios to minimize forgetting. Subsequently, a novel online LL strategy is presented to address the issue of catastrophic forgetting.

Next, an online regularization-based approach to LL is introduced.

2.5. Continual lifelong learning

To mitigate the issues of catastrophic forgetting^[13], a novel technique called WVA was proposed^[15]. However, WVA has only been used in an offline manner, which cannot be applied to NN-based online techniques.

In contrast, this study introduces a new online LL technique that can be integrated into an online NN-based trajectory tracking control scheme by identifying and safeguarding the most critical parameters during the optimization process. To achieve this, the proposed technique employs a performance function given by

(45)

$$ \begin{equation} \begin{split} L_{j}(\hat{W}_{cj}) \approx E_{bj}+\frac{\lambda_{j}}{2} \left\|\hat{W}_{cj}-\hat{W}_{cjp}^*\right\|_{ \bar{\Omega}_{kj}}^2, \end{split} \end{equation} $$

where $$ E_{bj} $$ is the loss function for the current task $$ B $$ in step j (41), $$ \bar{\Omega}_{kj}=diag\{\frac{\Omega_{w1}}{ \Omega_{w1}+1} \ldots \frac{\Omega_{wn}}{ \Omega_{wn}+1}\} $$, $$ \Omega_{wi} $$ represents the significance of the $$ i $$-th weight, where $$ i = 1, \dots, n $$, of the NN in step $$ j $$ after learning from prior tasks, the diagonal elements of the FIM $$ \bar{\Omega}_{kj} $$, with $$ k $$ denoting the task, are estimated using HJB error since targets are unavailable in online learning, $$ \lambda_{j} $$ is the design parameter controlling the strength of the regularization, $$ \left\|\hat{W}_{cj}-\hat{W}_{cjp}^*\right\|_{ \bar{\Omega}_{kj}}=(\hat{W}_{cj} -\hat{W}^{*}_{cjp})\bar{\Omega}_{kj}(\hat{W}_{cj} -\hat{W}^{*}_{cjp})^{\top} $$, $$ \hat{W}^*_{cjp} $$ is the optimized bounded weight vector of the previous task, and $$ \hat{W}_{cj} $$ is the weight vector of the current task that needs to be optimized. The FIM for each task is calculated by defining the log-likelihood function as:

(46)

$$ \begin{equation} \ell(\hat{W}_{cjp}, {z}_{jp}) = \log p({e}_{jp}|\hat{W}_{cjp}, {z}_{jp}), \end{equation} $$

where $$ {e}_{jp} $$ represents the HJB residual error, as defined by (20) and (35) from the previous task. The term $$ p({e}_{jp}|\hat{W}_{cjp}, {z}_{jp}) $$ denotes the probability density function of the HJB error, given the input $$ z_{jp} $$ from the previous task and the weights $$ \hat{W}_{cjp} $$ at step $$ j $$ from the previous task. Calculate the Jacobian matrix as

(47)

$$ \begin{equation} J(\hat{W}_{cjp}, {z}_{jp}) = \frac{\partial \ell(\hat{W}_{cjp}, {z}_{jp})}{\partial \hat{W}_{cjp}}, \end{equation} $$

where $$ \frac{\partial \ell(\hat{W}_{cjp}, {z}_{jp})}{\partial \hat{W}_{cjp}} $$ denotes the partial derivative of the log-likelihood function with respect to the weights from the previous task. Therefore, the estimation of FIM is obtained as

(48)

$$ \begin{equation} \bar{\Omega}_{k j} = \frac{1}{t_1-t_0} \int_{t_0}^{t_1} J\left(\hat{W}_{c jp}, z_{jp}\right)J\left(\hat{W}_{c jp}, z_{jp}\right)^{\top} dt, \end{equation} $$

where $$ t_0 $$ denotes the task start time, $$ t_{1} $$ denotes the task end time. The elements of the FIM illustrate the extent of information concerning the HJB error that is conveyed by every weight within the network. Sure! Here's a clearer and more structured way to convey the information. For the first task, the FIM is zero. When estimating the FIM for the second task using data from the first task, the estimate remains bounded. This bounded behavior is due to the fact that the closed-loop system associated with task 1 is also bounded, as demonstrated in Theorem 1.

Subsequently, leveraging normalized gradient descent allows us to formulate an additional term in the critic weight update law. This term is derived as follows

(49)

$$ \begin{equation} -\frac{ \partial}{\partial \hat{W}_{cj}}(L_{j}(\hat{W}_{cj}))= -\frac{\partial E_{j}(t)}{\partial \hat{W}_{c j}}-\lambda_{j}\bar{\Omega}_{kj} (\hat{W}_{cj}-\hat{W}^{*}_{cjp}). \end{equation} $$

For LL, the terms from (49) are combined with the terms from the previously defined update law that is given in Theorem 1. Next, the following theorem is stated.

Theorem 2Consider the hypothesis stated in Theorem 1, and let Assumptions 1 to 4 hold, with the LRL critic NN tuning law for j step optimal backstepping, given by

(50)

$$ \begin{equation} \dot{\hat{W}}_{cj} = \gamma_{cj} \hat{\psi}_j(t) \left( e_{j}(t) - \frac{1}{4} \hat{W}_{cj}^{\top} \nabla \sigma_{cj} \hat{\wedge}_{j} \nabla \sigma_{cj}^{\top} \hat{W}_{cj} - \hat{W}_{cj}^{\top} \nabla \sigma_{cj} \hat{\mathcal{F}}_{sj} \right) - \sigma_{j1} \hat{W}_{cj} - \alpha_{j} \lambda_{j} \bar{\Omega}_{kj} (\hat{W}_{cj} - \hat{W}^{*}_{cjp}), \end{equation} $$

where j denotes the number of steps in backstepping, $$ \lambda_{j} $$ is the design parameter, and $$ \alpha_{j} $$ are the NN learning rates; then, $$ \tilde{W}_{cj} $$ and all closed-loop signals, including those in Theorem 1, are UUB.

Proof: See Appendix.

Remark 10From (49), when the significance of the weights increases, $$ \Omega_{w_{i}} $$ can become infinite; therefore, $$ \frac{\Omega_{wi}}{ \Omega_{w_{i}}+1} $$ will become $$ 1 $$; hence, the gradient explosion can be avoided.

Remark 11The first part of the NN weight update law in Theorem 2 is the same as in Theorem 1, whereas the second part includes regularization terms resulting from LL. Notice that the tracking and weight estimation error bounds increase due to the LRL-based control scheme because the bounding constant, $$ \bar{C}=\bar{C}_{0j}+\gamma_{penj} $$, includes additional terms that are obtained from the penalty function from LL, where $$ \bar{C}_{0j} $$ is bound when LL is absent, and $$ \gamma_{penj} $$ is because of LL in jth step. The overall stability is unaffected due to LL.

Remark 12The proposed LL method is scalable to encompass $$ n $$ tasks. A third task, $$ C $$, aims to maintain network weights in alignment with the learned weights from the preceding two tasks, implementable via single or dual penalties, given the quadratic nature of the penalties.

Remark 13The efficacy of the LL method is pronounced when Tasks 1 and 2 share informational overlap reflected in the weights, facilitating the knowledge transfer for Task 2. However, in the absence of shared weights or knowledge between non-overlapping tasks, the visible enhancement in Task 2 performance might be negligible with online LL, albeit it mitigates the catastrophic forgetting of Task 1, offering long-term benefits when reverting to Task 1.

3. UAV TRACKING SIMULATION OUTCOMES

This section delineates the outcomes of optimal tracking control founded on LRL, applied on an underactuated UAV system.

3.1. Unmanned aerial vehicle (UAV) problem formulation and control design

Consider the UAV model depicted in Figure 1, which is characterized by two reference frames: the inertial frame $$ I=\{x_I, y_I, z_I\} $$ fixed to the earth and the body-fixed frame $$ A=\{x_A, y_A, z_A\} $$. The UAV is steered using four forces $$ F_{j=1, 2, 3, 4} $$, each produced by a respective rotor. To negate the yaw drift due to the reactive torque, rotors are arranged in two pairs, with $$ (1, 3) $$ spinning clockwise and $$ (2, 4) $$ spinning counterclockwise.

Continual online learning-based optimal tracking control of nonlinear strict-feedback systems: application to unmanned aerial vehicles

Figure 1. Quadrotor UAV.

The quadrotor dynamics can be modeled by two unique equations: (1) translational; and (2) rotational. However, these dynamic equations interrelate via the rotation matrix, rendering them as two cohesive subsystems. A holistic control strategy involves both outer and inner loop controls, corresponding to the two subsystems. The outer loop aims to execute positional control by managing the state variables of $$x, y, $$ and $$z$$ while concurrently generating command inputs for roll and pitch states in the inner loop through a correction block, dependent on a predefined yaw reference signal. The inner loop's objective is to achieve attitude control by managing the state variables of roll, pitch, and yaw.

Define $$\zeta_1(t) = [x(t), y(t), z(t)]^\top \in \mathbb{R}^3$$ as the UAV's positional state and $$\zeta_2(t) = [\dot{x}(t), \dot{y}(t), \dot{z}(t)]^\top \in \mathbb{R}^3$$ as its velocity state within the inertial frame $$I$$. A transformation relation exists: $$\zeta_2(t) = R(\eta_1) V$$. Consequently, the translational dynamic equation can be represented as

$$ \begin{aligned} & \dot{\zeta}_1(t)=\zeta_2(t) \\ & \dot{\zeta}_2(t)=-\left[\begin{array}{l} 0 \\ 0 \\ g \end{array}\right]+R\left(\eta_1\right)\left[\begin{array}{c} 0 \\ 0 \\ 1 / m \end{array}\right] u_c . \end{aligned} $$

Given the underactuated nature of UAV translational dynamics, an intermediate control vector

$$ V=R\left(\eta_1\right)\left[\begin{array}{c} 0 \\ 0 \\ 1 / m \end{array}\right] u_c $$

is introduced for optimal position control derivation, and the translational dynamic can thus be reformulated as

$$ \begin{aligned} & \dot{\zeta}_1(t)=\zeta_2(t) \\ & \dot{\zeta}_2(t)=-\left[\begin{array}{l} 0 \\ 0 \\ g \end{array}\right]+V . \end{aligned} $$

Remark 14The relation between $$ V=\left[V_1, V_2, V_3\right]^\top \in \mathbb{R}^3 $$ and $$ u_c $$ is given by

$$ \begin{aligned} V_1 & =(\cos (\phi) \sin (\theta) \cos (\psi)+\sin (\phi) \sin (\psi)) \frac{u_c}{m} \\ V_2 & =(\cos (\phi) \sin (\theta) \sin (\psi)-\sin (\phi) \cos (\psi)) \frac{u_c}{m} \\ V_3 & =\cos (\phi) \cos (\theta) \frac{u_c}{m} \end{aligned} $$

Solving yields the control $$ u_c $$ as

$$ u_c=m\left(V_1^2+V_2^2+V_3^2\right)^{\frac{1}{2}} $$

Using the reference trajectory vector $$\zeta_{\text{ref}}(t) = \left[x_{\text{ref}}(t), y_{\text{ref}}(t), z_{\text{ref}}(t)\right]^\top \in \mathbb{R}^3$$, we define the tracking error variables as $$e _{\zeta_1}(t)=\zeta_1(t)-\zeta_{\text{ref}}(t)$$ and $$e _{\zeta_2}(t)=\zeta_2(t)-\alpha_{\zeta}$$, where $$\alpha_{\zeta}$$ is the virtual control.

For the coordinate transformation of rotational dynamic, the transformation relationship between rotational velocity $$\Omega$$ and the Euler angles rate of change $$\eta_2(t)=\dot{\eta}_1(t) = [\dot{\phi}(t), \dot{\theta}(t), \dot{\psi}(t)]^\top \in \mathbb{R}^3$$ is represented as:

$$ \eta_2(t) = \dot{\eta}_1(t) = \Phi\left(\eta_1\right) \Omega $$

with

$$ \Phi\left(\eta_1\right)=\left[\begin{array}{ccc} 1 & s(\phi) t(\theta) & c(\phi) t(\theta) \\ 0 & c(\phi) & -s(\phi) \\ 0 & s(\phi) \sec(\theta) & c(\phi) \sec(\theta) \end{array}\right] . $$

Applying time derivation to both sides yields the attitude dynamic as:

$$ \begin{aligned} \ddot{\eta}_1(t)= & -\Phi\left(\eta_1\right) I^{-1}\left(\Phi^{-1}\left(\eta_1\right) \eta_2(t) \dot{\times} I \Phi^{-1}\left(\eta_1\right) \eta_2(t)\right) +\dot{\Phi}\left(\eta_1\right) \Phi^{-1}\left(\eta_1\right) \eta_2(t) +\Phi\left(\eta_1\right) I^{-1} \tau . \end{aligned} $$

The function $$f\left(\eta_1, \eta_2\right)$$ is represented as:

$$ f\left(\eta_1, \eta_2\right)=-\Phi\left(\eta_1\right) I^{-1}\left(\Phi^{-1}\left(\eta_1\right) \eta_2(t) \dot{\times} I \Phi^{-1}\left(\eta_1\right) \eta_2(t)\right) +\dot{\Phi}\left(\eta_1\right) \Phi^{-1}\left(\eta_1\right) \eta_2(t) $$

So, the attitude dynamic can be rephrased in strict feedback form as

$$ \begin{aligned} & \dot{\eta}_1(t)=\eta_2(t) \\ & \dot{\eta}_2(t)=f\left(\eta_1, \eta_2\right)+\Phi\left(\eta_1\right) I^{-1} \tau . \end{aligned} $$

Reference signals are denoted as $$\eta_{\mathrm{re}}(t) = \left[\phi_{\mathrm{re}}(t), \theta_{\mathrm{re}}(t), \psi_{\mathrm{re}}(t)\right]^\top$$. The yaw command element $$\psi_{\mathrm{re}}(t)$$ is predefined, and the roll and pitch command angles are derived as

$$ \begin{aligned} & \phi_{\mathrm{re}}(t)=\arcsin \left(m \frac{V_1 \sin \left(\psi_{\mathrm{re}}\right)-V_2 \cos \left(\psi_{\mathrm{re}}\right)}{u_c}\right) \\ & \theta_{\mathrm{re}}(t)=\arctan \left(\frac{V_1 \cos \left(\psi_{\mathrm{re}}\right)+V_2 \sin \left(\psi_{\mathrm{re}}\right)}{V_3}\right) . \end{aligned} $$

Tracking error variables are designated as $$e_{\eta 1}(t)=\eta_1(t)-\eta_{\mathrm{re}}(t)$$ and $$e_{\eta 2}(t)=\eta_2(t)-\alpha_\eta$$, where $$\alpha_\eta \in \mathbb{R}^3$$ is the virtual control.

Therefore, using the control law (34) and the weight update laws shown in Theorems 1 and 2 for translational and attitude dynamics will drive the UAV system to track the reference trajectory, as shown in the simulations.

3.2. Simulation parameters and results

The desired position trajectory for $$ \xi_{\mathrm{re}}(t)= $$$$ [2t-20, 5, 0]^{\top} $$ and, for task 2, is $$ \xi_{\mathrm{re}}(t)= $$$$ [5 \sin (t), 5 \cos (t), t]^{\top} $$. The yaw command angle is predefined as $$ \psi_{\mathrm{re}}=\pi / 4 $$; then, the roll and pitch command signals $$ \phi_{\mathrm{re}} $$ and $$ \theta_{\mathrm{re}} $$ can be produced. The design parameter $$ \beta_1=[6, 0] $$, $$ \beta_2=[4.2, 0] $$, $$ m=5 $$, $$ \sigma_{1j}=0.75 $$ and $$ \sigma_{2j}=0.82 $$, $$ \beta_{ci}=0.6, \beta_{ai}=0.6 $$, $$ \lambda_{w}=2 $$, $$ \alpha_{w}=0.85 $$. The initial values are set as $$ z_1(0)=[1.7, 3.8, 0.41]^{\top}, z_2(0)=[2.31, 3, 0.4]^{\top} $$. In this configuration, the NN is based on a Random Vector Functional Link (RVFL) architecture. The network is structured with a single layer, where the input layer is directly connected to the output layer, bypassing traditional hidden layers. The interlayer connections are facilitated by a weight matrix $$ V $$, which is randomly initialized. This matrix $$ V $$ effectively represents the neurons in the network. Specifically, the network comprises ten neurons, each corresponding to a column in the $$ V $$ matrix. These neurons functionally link the input to the output through the RVFL mechanism. The transformation from the input to the output layer is given by $$ W^{\top}\sigma(V^{\top}X) $$, where $$ \sigma $$ denotes the activation function applied element-wise, and $$ X $$ is the input and $$ W $$ is tuned using the weight update laws in Theorems 1 and 2. Similarly, seven neurons are used for the identifier; sigmoid is used as an activation function, $$ \epsilon_{0}=0.65 $$. For attitude control, the initial values are set as $$ z_3(0)=[\pi / 4, \pi / 4, \pi / 4]^{\top}, z_4(0)=[\pi / 3, \pi / 3, \pi / 3]^{\top} $$.

We consider two task scenarios in which the reference trajectory is changed in each task as if the UAV is moving in a different path or environment. In the simulations, we have shown task 1 again to demonstrate that when the UAV returns to task 1, the LL-based control will help mitigate the catastrophic forgetting. The proposed method is able to drive the UAV to track the reference trajectory accurately, even on changing tasks. Figure 2 shows the performance of the position and attitude tracking; it indicates that using the proposed LRL method shown by the blue color, and the UAV position states can accurately follow the reference trajectory shown by the red color. The attitude tracking performance demonstrates that the UAV attitudes can also follow the reference attitudes better as compared to recent literature^[9] shown by green color. Figure 3 illustrates the tracking errors, indicating that the tracking performance of the proposed method is superior when compared with recent literature, referred to as 'r-lit'^[9]. Figure 4 illustrates the position and attitude tracking errors.

Figure 2. Actual and reference trajectories using the proposed LL-based method.

Figure 3. Tracking performance of position and attitude subsystems using LRL and recent literature (r-lit)^[9] methods.

Figure 4. Position and attitude tracking errors using proposed LRL and recent literature (r-lit)^[9] methods.

Both the system state tracking plots and positional error plots in Figure 2 and Figure 3 demonstrate the superior performance of the proposed LL method, represented by blue lines. However, the recent literature^[9] exhibits higher error, thus showing the need for LL. In contrast, the 'Lit' method, as shown in green, has a higher error rate when compared to other methods. Notably, the total average error shown in Figure 4 is low when the proposed LL method is employed over the 'Lit' method, indicating a substantial enhancement in tracking accuracy.

Figure 5 depicts torque inputs and cumulative costs where it can be seen that the cost of using the proposed method is minimal, and all the closed-loop signals are bounded. The control effort demanded by the 'Lit' method is higher in comparison to the proposed LL-based method. Figure 5 also showcases the cumulative cost. It is observed that the cost associated with Lit (shown in green color) is higher compared to the proposed LL method (represented in blue) during the tasks and as the tasks change.

Figure 5. Torque inputs and cumulative cost using proposed LRL and recent literature (r-lit)^[9] methods.

4. CONCLUSION AND DISCUSSION

This paper proposed an innovative LL tracking control technique for uncertain nonlinear CT systems in strict feedback form. The method combined the augmented system, trajectory generator, and optimal backstepping approach to design both feedforward and feedback terms of the tracking scheme. By utilizing a combination of actor-critic NN and identifier NN, the method effectively approximated the solution to the HJB equations with unknown nonlinear functions. The use of RL at each step of the backstepping process allows for the development of virtual and actual optimal controllers that can effectively handle the challenges posed by uncertain, strict feedback systems. The proposed work highlighted the significance of considering catastrophic forgetting in online controller design and developed a new method to address this issue. Simulation results on a UAV tracking a desired trajectory show acceptable performance. The proposed approach can be extended by using deep NNs for better approximation. In addition, the integral RL (IRL)-based approach can relax the drift dynamics. Dynamic surface control can be included to minimize the number of NNs used.

DECLARATIONS

Authors' contributions

Made substantial contributions to the conception and design of the study: Ganie I, Jagannathan S

Made contributions in writing, reviewing, editing, and methodology: Ganie I, Jagannathan S

Availability of data and materials

Not applicable.

Financial support and sponsorship

The project or effort undertaken was or is sponsored by the Office of Naval Research Grant N00014-21-1-2232 and Army Research Office Cooperative Agreements W911NF-21-2-0260 and W911NF-22-2-0185.

Conflicts of interest

Both authors declared that there are no conflicts of interest.

Ethical approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Copyright

APPENDICES

Proof of Theorem 1

Step1: Consider the Lyapunov function as follows

(51)

$$ \begin{equation} \begin{split} L_{1}=J_{1}(z_{1})+\frac{1}{2}e_{i1}^{\top}e_{i1} + \frac{1}{2}tr \left\{\tilde {Z}_{1}^{\top}\tilde {Z}_{1} \right\}+\frac{1}{2}tr\{\tilde{W}_{c1}\tilde{W}_{c1}^{\top}\}+\frac{1}{2}tr\{\tilde{W}_{a1}\tilde{W}_{a1}^{\top}\}. \end{split} \end{equation} $$

The time derivative of $$ L_{1} $$ is

(52)

$$ \begin{equation} \begin{split} \dot{L}_{1}=\dot{J}_{1}(z_{1})+tr\{e_{i1}^{\top}\dot{e}_{i1}\}+tr \{\tilde {Z}_{1}^{\top} \dot{\tilde {Z}}_{1}\}+tr\{\tilde{W}_{c1}^{\top}\dot{\tilde{W}}_{c1}\}+tr\{\tilde{W}_{a1}^{\top}\dot{\tilde{W}}_{a1}\}. \end{split} \end{equation} $$

Where $$ tr $$ denotes the trace operator. Let $$ \dot{L}_{1}=\dot{L}_{11}+\dot{L}_{12}+\dot{L}_{13}+\dot{L}_{14}+\dot{L}_{15}. $$ where $$ {L}_{11}=J_{1}(z_{1}), {L}_{12}=\frac{1}{2}e_{i1}^{\top}e_{i1}, {L}_{13}= \frac{1}{2}tr \left\{\tilde {Z}_{1}^{\top}\tilde {Z}_{1} \right\}, {L}_{14}=\frac{1}{2}tr\{\tilde{W}_{c1}\tilde{W}_{c1}^{\top}\}, {L}_{15}=\frac{1}{2}tr\{\tilde{W}_{a1}\tilde{W}_{a1}^{\top}\}. $$

Considering the first term of (51), we can write it as $$ \dot{J}_{1}(z_{1})=\nabla J_{1}\dot{z}_{1} $$.

Substituting (3), (9) in (52) gives

(53)

$$ \begin{equation} \dot{J}_{1}(z_{1})=(W_{c1}^{\top}\nabla\sigma_{c1}+\nabla\varepsilon_{c1})(\mathcal{F}_{s1}(z_{1})+\mathcal{G}_{s1}(z_{1})\alpha_{1}). \end{equation} $$

Substituting the value of $$ \alpha $$ from (8) in (53) leads to

(54)

$$ \begin{equation} \dot{J}_{1}(z_{1})=(W_{c1}^{\top}\nabla \sigma_{c1}+\nabla\varepsilon_{c1})\big(\mathcal{F}_{s1}(z_{1})-\frac{1}{2} \mathcal{G}_{s1}(z_{1})r^{-1}\hat{\mathcal{G}}^{\top}_{s1}(z_{1})(\nabla\sigma_{c1}^{\top}\hat{W}_{c1})\big). \end{equation} $$

Using $$ \hat{W}_{c1}=W_{c1}-\tilde{W}_{c1} $$ in (54), and simplifying, one will get

(55)

$$ \begin{equation} \begin{split} \dot{J}_{1}(z_{1})= (W_{c1}^{\top}\nabla\sigma_{c1}+\nabla\varepsilon_{c1})\bigg(\mathcal{F}_{s1}(z_{1})-\frac{1}{2}\mathcal{G}_{s1}(z_{1})r^{-1}\mathcal{G}^{\top}_{s1}(z_{1})\nabla\sigma_{c1}^{\top}(W_{c1}-\tilde{W}_{c1})\bigg) \end{split} \end{equation} $$

Separating the terms in (55) w.r.t actual NN weights and the terms w.r.t weight estimation error gives

(56)

$$ \begin{equation} \begin{split} &\dot{J}_{1}(z_{1})=W_{c1}^{\top}\nabla\sigma_{c1} \mathcal{F}_{s1}(z_{1})-\frac{1}{2}W_{c1}^{\top}\nabla \sigma_{c1} \mathcal{G}_{s1}(z_{1})r^{-1}\hat{\mathcal{G}}^{\top}_{s1}(z_{1})\nabla \sigma_{c1}^{\top}W_{c1}\\&+\frac{1}{2}W_{c1}^{\top}\nabla\sigma_{c1} \mathcal{G}_{s1}(z_{1})r^{-1}\hat{\mathcal{G}}^{\top}_{s1}(z_{1})\nabla\sigma^{\top}_{c1}\tilde{W}_{c1}+ \nabla\varepsilon_{c1} \mathcal{F}_{s1}(z_{1})\\&-\frac{\nabla \varepsilon_{c1}}{2}\mathcal{G}_{s1}(z_{1})r^{-1}\hat{\mathcal{G}}^{\top}_{s1}(z_{1})\nabla\sigma^{\top}_{c1}W_{c1}+\frac{\nabla \varepsilon_{c1}}{2}\mathcal{G}_{s1}(z_{1})r^{-1}\hat{\mathcal{G}}^{\top}_{s1}(z_{1})\nabla\sigma_{c1}^{\top}\tilde{W}_{c1}. \end{split} \end{equation} $$

Substituting $$ \hat{\mathcal{G}}_{s1}=\mathcal{G}_{s1}-\tilde{\mathcal{G}}_{s1} $$ in (56) leads to

(57)

$$ \begin{equation} \begin{split} &\dot{J}_{1}(z_{1})=W_{c1}^{\top}\nabla\sigma_{c1} \mathcal{F}_{s1}(z_{1})-\frac{1}{2}W_{c1}^{\top}\nabla \sigma_{c1} \mathcal{G}_{s1}(z_{1})r^{-1}(\mathcal{G}_{s1}-\tilde{\mathcal{G}}_{s1})^{\top}\nabla \sigma^{\top}_{c1}W_{c1}+\frac{1}{2}W_{c1}^{\top}\nabla\sigma_{c1} \mathcal{G}_{s1}(z_{1})r^{-1}\\&(\mathcal{G}_{s1}-\tilde{\mathcal{G}}_{s1})^{\top}\nabla\sigma^{\top}_{c1}\tilde{W}_{c1}+ \nabla\varepsilon_{c1} \mathcal{F}_{s1}(z_{1})-\frac{\nabla \varepsilon_{c1}}{2}\mathcal{G}_{s1}(z_{1})r^{-1}(\mathcal{G}_{s1}-\tilde{\mathcal{G}}_{s1})^{\top}\nabla\sigma^{\top}_{c1}W_{c1}\\&+\frac{\nabla \varepsilon_{c1}}{2}\mathcal{G}_{s1}(z_{1})r^{-1}(\mathcal{G}_{s1}-\tilde{\mathcal{G}}_{s1})^{\top}\nabla\sigma^{\top}_{c1}\tilde{W}_{c1}. \end{split} \end{equation} $$

One can further simplify (57), as follows

(58)

$$ \begin{equation} \begin{split} &\dot{J}_{1}(z_{1})=tr\{W_{c1}^{\top}\nabla\sigma_{c1} \mathcal{F}_{s1}(z_{1})-\frac{1}{2}W_{c1}^{\top}\nabla \sigma_{c1} \mathcal{G}_{s1}(z_{1})r^{-1}\mathcal{G}^{\top}_{s1}\nabla \sigma_{c1}^{\top}W_{c1}\\&+\frac{1}{2}W_{c1}^{\top}\nabla\sigma_{c1} \mathcal{G}_{s1}(z_{1})r^{-1}\mathcal{G}^{\top}_{s1}\nabla\sigma_{c1}^{\top}\tilde{W}_{c1}+ \nabla\varepsilon_{c1} \mathcal{F}_{s1}(z_{1})\\&-\frac{\nabla \varepsilon_{c1}}{2}\mathcal{G}_{s1}(z_{1})r^{-1}\mathcal{G}^{\top}_{s1}\nabla\sigma_{c1}^{\top}W_{c1}+\frac{\nabla \varepsilon_{c1}}{2}\mathcal{G}_{s1}(z_{1})r^{-1}\mathcal{G}^{\top}_{s1}\nabla\sigma_{c1}^{\top}\tilde{W}_{c1}\\&+\frac{1}{2}W_{c1}^{\top}\nabla \sigma_{c1} \mathcal{G}_{s1}(z_{1})r^{-1}\tilde{\mathcal{G}}^{\top}_{s1}\nabla \sigma_{c1}^{\top}W_{c1}-\frac{1}{2}W_{c1}^{\top}\nabla\sigma_{c1} \mathcal{G}_{s1}(z_{1})r^{-1}\tilde{\mathcal{G}}^{\top}_{s1}\nabla\sigma_{c1}^{\top}\tilde{W}_{c1}\\& +\frac{\nabla \varepsilon_{c1}}{2}\mathcal{G}_{s1}(z_{1})r^{-1}\tilde{\mathcal{G}}^{\top}_{s1}\nabla\sigma_{c1}^{\top}W_{c1}-\frac{\nabla \varepsilon_{c1}}{2}\mathcal{G}_{s1}(z_{1})r^{-1}\tilde{\mathcal{G}}^{\top}_{s1}\nabla\sigma_{c1}^{\top}\tilde{W}_{c1}\}. \end{split} \end{equation} $$

Using (11), we have $$ W_{c1}^{\top}\nabla\sigma_{c1} \mathcal{F}_{s1}(z_{1})-\frac{1}{2}W_{c1}^{\top}\nabla \sigma_{c1} \mathcal{G}_{s1}(z_{1})r^{-1}\mathcal{G}_{s1}(z_{1})W_{c1}\nabla\sigma_{c1}\leq-k\|z_{1}\|^{2} $$. Additionally, it is also assumed that the effect of the NN approximation error in the system dynamics is bounded above, i.e., $$ \nabla \varepsilon_{c1} \mathcal{F}_{s1}(z_{1})-\frac{1}{2}\nabla \varepsilon_{c1} \mathcal{G}_{s1}(z_{1})r^{-1}\mathcal{G}_{s1}(z_{1})W_{c1}\nabla\sigma_{c1}\leq-\bar{\epsilon}\|z_{1}\|^{2} $$. One can write (58), as follows

(59)

$$ \begin{equation} \begin{split} \dot{J}_{1}(z_{1})= -(k+\bar{\epsilon})z_{1}^{2}+k_{1}\tilde{W}_{c1}+k_{2}\tilde{W}_{c1}+k_{3}\tilde{\mathcal{G}}_{s1}+k_{4}\tilde{\mathcal{G}}_{s1}+k_{5}\tilde{W}_{c1}\tilde{\mathcal{G}}_{s1}+k_{6}\tilde{W}_{c1}\tilde{\mathcal{G}}_{s1}. \end{split} \end{equation} $$

which can be further written as

(60)

$$ \begin{equation} \begin{split} &\dot{J}_{1}(z_{1})= -\bar{k}_{1}z_{1}^{2}+\bar{k}_{2}\tilde{W}_{c1}+\bar{k}_{3}\tilde{\mathcal{G}}_{s1}+\bar{k}_{4}\tilde{W}_{c1}\tilde{\mathcal{G}}_{s1}, \end{split} \end{equation} $$

where $$ \frac{1}{2}W_{c1}^{\top}\nabla\sigma \mathcal{G}_{s1}(z_{1})r^{-1}\mathcal{G}_{s1}\nabla\sigma \leq k_{1} $$, $$ \frac{\nabla \varepsilon_{c1}}{2}\mathcal{G}_{s1}(z_{1})r^{-1}\mathcal{G}_{s1}\nabla\sigma \leq k_{2} $$, $$ \frac{1}{2}W_{c1}^{\top}\nabla \sigma \mathcal{G}_{s1}(z_{1})r^{-1}W\nabla \sigma \leq k_{3} $$, $$ \frac{\nabla \varepsilon_{c1}}{2}\mathcal{G}_{s1}(z_{1})r^{-1}W_{c1}\nabla\sigma \leq k_{4} $$, $$ \frac{1}{2}W_{c1}^{\top}\nabla\sigma \mathcal{G}_{s1}(z_{1})r^{-1}\nabla\sigma \leq k_{5} $$, $$ \frac{\nabla \varepsilon_{c1}}{c2}\mathcal{G}_{s1}(z_{1})r^{-1}\nabla\sigma \leq k_{6} $$. Consider the second and third term of (52)

(61)

$$ \begin{equation} \dot{L}_{12}+\dot{L}_{13}=tr\{e_{i1}^{\top}\dot{e}_{i1}\}+tr\{\tilde{Z}_{1}^{\top}\dot{\tilde{Z}}_{1}\} \end{equation} $$

On substituting the value of $$ \dot{e}_{i1} $$ and $$ \dot{\tilde{Z}}_{1} $$ from Theorem 1, one can write RHS of (83) as follows

(62)

$$ \begin{equation} \begin{split} tr\bigg\{e_{i1}^{\top}(-Ke_{i1}+\tilde{Z}_{1}\boldsymbol{\sigma(\xi)}\hat{\bar{u}}+\varepsilon_I(z_{1}))+\tilde{Z}_{1}^{\top} \left (-\alpha_v \tilde{Z}_{1}- \boldsymbol{\sigma(\xi)}\hat{\bar{u}}e_{i1}^{\top}+\alpha_v Z_{1} \right )\bigg\}. \end{split} \end{equation} $$

which can be further simplified by using the cyclic property of traces as

(63)

$$ \begin{equation} \begin{split} tr\{e_{i1}^{\top}(-Ke_{i1}+\varepsilon_I(z_{1}))+\tilde{Z}_{1}^{\top} \left (-\alpha_v \tilde{Z}_{1}+\alpha_v Z_{1} \right )\}. \end{split} \end{equation} $$

To simplify, we have

(64)

$$ \begin{equation} \dot{L}_{12}+\dot{L}_{13}\leq -K\|e_{i1}\|^{2}-\alpha_{v}\|\tilde{Z}_{1}\|^{2}+\|e_{i1}\|^{2}+\|z_{1}\|^{2}+\|\alpha_{v}\tilde{Z}^{\top}_{1}Z_{1}\| \end{equation} $$

Consider the fourth term of (52),

(65)

$$ \begin{equation} \begin{split} \dot{L}_{14}=tr\bigg\{\tilde{W}^{\top}_{c1}\bigg(\gamma_{c} \hat{\psi}_1(t) (-e_{1}(t)+\frac{1}{4}\hat{W}_{c1}^{\top}\nabla\sigma_{c1}\wedge\nabla\sigma_{c1}^{\top}\hat{W}_{c1}+\hat{W}^{\top}_{c1}\nabla\sigma_{c1} \mathcal{F}_{s1})+\sigma_{11}\hat{W}_{c1}\bigg)\bigg\} \end{split} \end{equation} $$

Therefore, one can further simplify (65) by using Young's inequality in cross-product terms as follows

(66)

$$ \begin{equation} \dot{L}_{14}\leq c_{1}\|\tilde{W}_{c1}\|^{2}+c_{2}\|\tilde{W}_{c1}\|+c_{3}\|\tilde{Z}_{1}\|^{2}+c_{4}\|\tilde{W}_{c1}\|-c_{5}\|\tilde{W}_{c1}\|^{2} \end{equation} $$

Consider the fifth term of (52) $$ \dot{L}_{15}=tr\{\tilde{W}_{a1}^{\top}\dot{W}_{a1}\} $$. On using the weight update law from Theorem 2, one can write

(67)

$$ \begin{equation} \begin{split} \dot{L}_{15}=tr\bigg\{\tilde{W}_{a1}^{\top}(\beta_{a}S_{3}(\beta_{1}z_{1}+\frac{1}{2}\hat{W}^{\top}_{a1}\sigma_{a1}\left( \bar{z}_{1}\right)-\frac{1}{2}r^{-1}\hat{\mathcal{G}}_{s1}^{\top}\nabla\sigma^{\top}_{c1}\hat{W}_{c1})+\sigma_{22}\hat{W}_{a1})\bigg\} \end{split} \end{equation} $$

Using $$ \tilde{W}_{c1}=W_{c1}-\hat{W}_{c1} $$, $$ \tilde{W}_{a1}=W_{a1}-\hat{W}_{a1} $$, $$ \tilde{G}_{s1}=\mathcal{G}_{s1}-\hat{\mathcal{G}}_{s1} $$

(68)

$$ \begin{equation} \begin{split} \dot{L}_{15}=tr\bigg\{\tilde{W}_{a1}^{\top}(\beta_{a}S_{3}(\beta_{1}z_{1}+\frac{1}{2}({W}_{a1}-\tilde{W}_{a1})^{\top}\sigma_{a1}\left( \bar{z}_{1}\right)-\frac{1}{2}r^{-1}(\mathcal{G}_{s1}-\tilde{\mathcal{G}}_{s1})^{\top}\nabla\sigma^{\top}_{c1}({W}_{c1}-\tilde{W}_{c1})+\sigma_{22}\tilde{W}_{a1}(W_{a1}-\tilde{W}_{a1})\bigg\} \end{split} \end{equation} $$

Considering a last term, we can write

$$ \tilde{W}_{a1}(W_{a1}-\tilde{W}_{a1})=\tilde{W}_{a1}W_{a1}-\tilde{W}_{a1}^{2} $$

Using Young's inequality in the cross product terms, we can write

(69)

$$ \begin{equation} \begin{split} \dot{L}_{15}=c_{5}\|\tilde{W}_{a1}\|^{2}+c_{6}\|z_{1}\|^{2}+c_{7}\|\tilde{Z}_{1}\|^{2}+c_{8}\|W_{c1}\|^{2}+c_{9}\|\tilde{Z}_{1}\|+c_{10}\|\tilde{W}_{c1}\|+c_{11}\|\tilde{W}_{a1}\|-c_{12}\|\tilde{W}_{a1}\|^{2} \end{split} \end{equation} $$

Combining (60), (64), (66) and (69) and simplifying, we have

(70)

$$ \begin{equation} \begin{split} \dot{L}_{1}\leq -\bar{c}_{1}\|z_{1}\|^{2}+\bar{k}_{2}\|\tilde{W}_{c1}\|+\bar{k}_{3}\|\tilde{Z}_{1}\|-\bar{c}_{2}\|\tilde{W}_{c1}\|^{2}-\bar{c}_{3}\|\tilde{Z}_{1}\|^{2}-\bar{c}_{5}\|e_{i1}\|^{2}+c_{0}\|\tilde{W}_{a1}\|-\bar{c}_{4}\|\tilde{W}_{a1}\|^{2}, \end{split} \end{equation} $$

where $$ \bar{c}_{1}=\bar{k}_{1}-c_{13}, \bar{k}_{1}>c_{13}, \bar{c}_{2}=c_{4}-\bar{k}_{4}, c_{4}>\bar{k}_{4}, \bar{c}_{3}=\alpha_{v}-k_{4}, \alpha_{v}>\bar{k}_{4}, \bar{c}_{4}=c_{12}-c_{5}, c_{12}>c_{5} $$. We can simplify it further as follows

(71)

$$ \begin{equation} \begin{split} \dot{L}_{1} < -\bar{c}_{1}\|z_{1}\|^{2}-\bar{c}_{2}(\|\tilde{W}_{c1}\|-\frac{\bar{k}_{2}}{\bar{c}_{2}})^{2}-\bar{c}_{3}(\|\tilde{Z}_{1}\|-\frac{k_{3}}{\bar{c}_{3}})^{2}-\bar{c}_{4}(\|\tilde{W}_{a1}\|-\frac{c_{0}}{\bar{c}_{4}})^{2}-\bar{c}_{5}\|{e}_{i1}\|^{2}+\bar{C}_{01}, \end{split} \end{equation} $$

where $$ \bar{C}_{01}=\frac{k_{2}^{2}}{c_{2}}+\frac{k_{3}^{2}}{c_{3}}+\frac{c_{0}^{2}}{c_{4}} $$. Therefore, we can write

(72)

$$ \begin{equation} \begin{cases} \begin{split} & \|z_{1}\|> {\sqrt{\frac{\bar{C}_{01}}{c_{1}}}} or \|\tilde{W}_{c1}\|>{\sqrt{\frac{\bar{C}_{01}}{c_{2}}}+\frac{k_{2}}{{c}_{2}}} or\\& \|\tilde{W}_{a1}\|> {\sqrt{\frac{\bar{C}_{01}}{c_{4}}}+\frac{c_{0}}{{c}_{4}}} or \|\tilde{Z}_{1}\|>{\sqrt{\frac{\bar{C}_{01}}{c_{3}}}+\frac{k_{3}}{{c}_{3}}} or \|{e}_{i1}\|>{\sqrt{\frac{\bar{C}_{01}}{c_{5}}}}. \end{split} \end{cases} \end{equation} $$

This demonstrates that the overall closed-loop system is bounded. Since $$ \tilde{G} $$ is a function of the weight estimation error of the NN identifier, or $$ \tilde {Z} $$, and from (72), $$ \tilde{Z} $$ is bounded; as a consequence, $$ \hat {G} $$ is bounded. From (72) $$ \tilde{W} $$, identification error and system state are bounded. As a result, the control input error becomes bounded.

Step 2:

This is the final step. Consider the Lyapunov function as follows

(73)

$$ \begin{equation} \begin{split} L_{2}=L_{1}+J_{2}(z_{2})+\frac{1}{2}e_{i2}^{\top}e_{i2} + \frac{1}{2}tr \left\{\tilde {Z}_{2}^{\top}\tilde {Z}_{2} \right\}+\frac{1}{2}tr\{\tilde{W}_{c2}\tilde{W}_{c2}^{\top}\}+\frac{1}{2}tr\{\tilde{W}_{a2}\tilde{W}_{a2}^{\top}\}. \end{split} \end{equation} $$

The time derivative of $$ L_{2} $$ is

(74)

$$ \begin{equation} \begin{split} \dot{L}_{2}=\dot{L}_{1}+\dot{J}_{2}(z_{2})+tr\{e_{i2}^{\top}\dot{e}_{i2}\}+tr \{\tilde {Z}_{2}^{\top} \dot{\tilde {Z}}_{2} \}+tr\{\tilde{W}_{c2}^{\top}\dot{\tilde{W}}_{c2}\}+tr\{\tilde{W}_{a2}^{\top}\dot{\tilde{W}}_{a2}\}. \end{split} \end{equation} $$

Let $$ \dot{L}_{2}=L_{1}+\dot{L}_{21}+\dot{L}_{22}+\dot{L}_{23}+\dot{L}_{24}+\dot{L}_{25}. $$ where $$ {L}_{21}=J_{2}(z_{2}), {L}_{22}=\frac{1}{2}e_{i2}^{\top}e_{i2}, {L}_{23}= \frac{1}{2}tr \left\{\tilde {Z}_{2}^{\top}\tilde {Z}_{2} \right\}, {L}_{24}=\frac{1}{2}tr\{\tilde{W}_{c2}\tilde{W}_{c2}^{\top}\}, {L}_{25}=\frac{1}{2}tr\{\tilde{W}_{a2}\tilde{W}_{a2}^{\top}\}. $$

Considering the second term of (74), substituting (3) in $$ \dot{J}_{2}(z_{2}) $$, it gives

(75)

$$ \begin{equation} \dot{J}_{2}(z_{2})=(W_{c2}^{\top}\nabla\sigma_{c2}+\nabla\varepsilon_{c2})(\mathcal{F}_{s2}(z_{2})+\mathcal{G}_{s2}(z_{2})u). \end{equation} $$

which on further solving leads to

(76)

$$ \begin{equation} \dot{J}_{2}(z_{2})=(W_{c2}^{\top}\nabla \sigma_{c2}+\nabla\varepsilon_{c2})(\mathcal{F}_{s2}(z_{2})-\frac{1}{2} \mathcal{G}_{s2}(z_{2})r_{2}^{-1}(\hat{\mathcal{G}}^{\top}_{s2}(z_{2})\big(\nabla\sigma_{c2}^{\top}\hat{W}_{c2})\big). \end{equation} $$

Using $$ \hat{W}_{c2}=W_{c2}-\tilde{W}_{c2} $$ in (76), and simplifying, one will get

(77)

$$ \begin{equation} \begin{split} &\dot{J}_{2}(z_{2})= (W_{c2}^{\top}\nabla\sigma_{c2}+\nabla\varepsilon_{c2})(\mathcal{F}_{s2}(z_{2})-\frac{1}{2}\mathcal{G}_{s2}(z_{2})r_{2}^{-1}\mathcal{G}^{\top}_{s2}(z_{1})\nabla\sigma_{c2}^{\top}(W_{c2}-\tilde{W}_{c2})). \end{split} \end{equation} $$

Separating the terms in (77) w.r.t actual NN weights and the terms w.r.t weight estimation error gives

(78)

$$ \begin{equation} \begin{split} &\dot{J}_{2}(z_{2})=W_{c2}^{\top}\nabla\sigma_{c2} \mathcal{F}_{s2}(z_{2})-\frac{1}{2}W_{c2}^{\top}\nabla \sigma_{c2} \mathcal{G}_{s2}(z_{2})r_{2}^{-1}\hat{\mathcal{G}}^{\top}_{s2}(z_{2})\nabla \sigma_{c2}^{\top}W_{c2}\\&+\frac{1}{2}W_{c2}^{\top}\nabla\sigma_{c2} \mathcal{G}_{s2}(z_{2})r_{2}^{-1}\hat{\mathcal{G}}^{\top}_{s2}(z_{2})\nabla\sigma^{\top}_{c2}\tilde{W}_{c2}+ \nabla\varepsilon_{c2} \mathcal{F}_{s2}(z_{2})\\&-\frac{\nabla \varepsilon_{c2}}{2}\mathcal{G}_{s2}(z_{2})r_{2}^{-1}\hat{\mathcal{G}}^{\top}_{s2}(z_{2})\nabla\sigma^{\top}_{c2}W_{c2}+\frac{\nabla \varepsilon_{c2}}{2}\mathcal{G}_{s2}(z_{2})r_{2}^{-1}\hat{\mathcal{G}}^{\top}_{s2}(z_{2})\nabla\sigma_{c2}^{\top}\tilde{W}_{c2}. \end{split} \end{equation} $$

Substituting $$ \hat{\mathcal{G}}_{s2}=\mathcal{G}_{s2}-\tilde{\mathcal{G}}_{s2} $$ in (78) leads to

(79)

$$ \begin{equation} \begin{split} &\dot{J}_{2}(z_{2})=W_{c2}^{\top}\nabla\sigma_{c2} \mathcal{F}_{s2}(z_{2})-\frac{1}{2}W_{c2}^{\top}\nabla \sigma_{c2} \mathcal{G}_{s2}(z_{2})r_{2}^{-1}(\mathcal{G}_{s2}-\tilde{\mathcal{G}}_{s2})^{\top}\nabla \sigma^{\top}_{c2}W_{c2}\\&+\frac{1}{2}W_{c2}^{\top}\nabla\sigma_{c2} \mathcal{G}_{s2}(z_{2})r_{2}^{-1}(\mathcal{G}_{s2}-\tilde{\mathcal{G}}_{s2})^{\top}\nabla\sigma^{\top}_{c2}\tilde{W}_{c2}+ \nabla\varepsilon_{c2}\mathcal{F}_{s2}(z_{2})-\frac{\nabla \varepsilon_{c2}}{2}\mathcal{G}_{s2}(z_{2})r_{2}^{-1}(\mathcal{G}_{s2}-\tilde{\mathcal{G}}_{s2})^{\top}\nabla\sigma^{\top}_{c2}W_{c2}\\&+\frac{\nabla \varepsilon_{c2}}{2}\mathcal{G}_{s2}(z_{2})r_{2}^{-1}(\mathcal{G}_{s2}-\tilde{\mathcal{G}}_{s2})^{\top}\nabla\sigma^{\top}_{c2}\tilde{W}_{c2}. \end{split} \end{equation} $$

One can further simplify (79), as follows

(80)

$$ \begin{equation} \begin{split} &\dot{J}_{2}(z_{2})=tr\{W_{c2}^{\top}\nabla\sigma_{c2} \mathcal{F}_{s2}(z_{2})-\frac{1}{2}W_{c2}^{\top}\nabla \sigma_{c2} \mathcal{G}_{s2}(z_{2})r_{2}^{-1}\mathcal{G}^{\top}_{s2}\nabla \sigma_{c2}^{\top}W_{c2}\\&+\frac{1}{2}W_{c2}^{\top}\nabla\sigma_{c2} \mathcal{G}_{s2}(z_{2})r_{2}^{-1}\mathcal{G}^{\top}_{s2}\nabla\sigma_{c2}^{\top}\tilde{W}_{c2}+ \nabla\varepsilon_{c2} \mathcal{F}_{s2}(z_{2})\\&-\frac{\nabla \varepsilon_{c2}}{2}\mathcal{G}_{s2}(z_{2})r_{2}^{-1}\mathcal{G}^{\top}_{s2}\nabla\sigma_{c2}^{\top}W_{c2}+\frac{\nabla \varepsilon_{c2}}{2}\mathcal{G}_{s2}(z_{2})r_{2}^{-1}\mathcal{G}^{\top}_{s2}\nabla\sigma_{c2}^{\top}\tilde{W}_{c2}\\&+\frac{1}{2}W_{c2}^{\top}\nabla \sigma_{c2} \mathcal{G}_{s2}(z_{2})r_{2}^{-1}\tilde{\mathcal{G}}^{\top}_{s2}\nabla \sigma_{c2}^{\top}W_{c2}-\frac{1}{2}W_{c2}^{\top}\nabla\sigma_{c2} \mathcal{G}_{s2}(z_{2})r_{2}^{-1}\tilde{\mathcal{G}}^{\top}_{s2}\nabla\sigma_{c2}^{\top}\tilde{W}_{c2}\\& +\frac{\nabla \varepsilon_{c2}}{2}\mathcal{G}_{s2}(z_{2})r_{2}^{-1}\tilde{\mathcal{G}}^{\top}_{s2}\nabla\sigma_{c2}^{\top}W_{c2}-\frac{\nabla \varepsilon_{c2}}{2}\mathcal{G}_{s2}(z_{2})r_{2}^{-1}\tilde{\mathcal{G}}^{\top}_{s2}\nabla\sigma_{c2}^{\top}\tilde{W}_{c2}\}. \end{split} \end{equation} $$

Using (11), we have $$ W_{c2}^{\top}\nabla\sigma_{c2} \mathcal{F}_{s2}(z_{2})-\frac{1}{2}W_{c2}^{\top}\nabla \sigma_{c2} \mathcal{G}_{s2}(z_{2})r_{2}^{-1}\mathcal{G}_{s2}(z_{2})W_{c2}\nabla\sigma_{c2}\leq-k\|z_{2}\|^{2} $$. Additionally, it is also assumed that the effect of the NN approximation error in the system dynamics is bounded above, i.e., $$ \nabla \varepsilon_{c2} \mathcal{F}_{s2}(z_{2})-\frac{1}{2}\nabla \varepsilon_{c2} \mathcal{G}_{s2}(z_{2})r_{2}^{-1}\mathcal{G}_{s2}(z_{2})W_{c2}\nabla\sigma_{c2}\leq-\bar{\epsilon}\|z_{2}\|^{2} $$. One can write (80), as follows

(81)

$$ \begin{equation} \begin{split} \dot{J}_{2}(z_{2})=-(k+\bar{\epsilon})z_{2}^{2}+k_{1}\tilde{W}_{c2}+k_{2}\tilde{W}_{c2}+k_{3}\tilde{\mathcal{G}}_{s2}+k_{4}\tilde{\mathcal{G}}_{s2}+k_{5}\tilde{W}_{c2}\tilde{\mathcal{G}}_{s2}+k_{6}\tilde{W}_{c2}\tilde{\mathcal{G}}_{s2}. \end{split} \end{equation} $$

which can be further written as

(82)

$$ \begin{equation} \begin{split} &\dot{J}_{2}(z_{2})= -\bar{k}_{1}z_{2}^{2}+\bar{k}_{2}\tilde{W}_{c2}+\bar{k}_{3}\tilde{\mathcal{G}}_{s2}+\bar{k}_{4}\tilde{W}_{c2}\tilde{\mathcal{G}}_{s2}, \end{split} \end{equation} $$

where $$ \frac{1}{2}W_{c2}^{\top}\nabla\sigma \mathcal{G}_{s2}(z_{2})r_{2}^{-1}\mathcal{G}_{s2}\nabla\sigma \leq k_{1} $$, $$ \frac{\nabla \varepsilon}{2}\mathcal{G}_{s2}(z_{2})r_{2}^{-1}\mathcal{G}_{s2}\nabla\sigma \leq k_{2} $$, $$ \frac{1}{2}W_{c2}^{\top}\nabla \sigma \mathcal{G}_{s2}(z_{2})r_{2}^{-1}W_{c2}\nabla \sigma \leq k_{3} $$, $$ \frac{\nabla \varepsilon}{2}\mathcal{G}_{s2}(z_{2})r_{2}^{-1}W_{c2}\nabla\sigma \leq k_{4} $$, $$ \frac{1}{2}W_{c2}^{\top}\nabla\sigma \mathcal{G}_{s2}(z_{2})r_{2}^{-1}\nabla\sigma \leq k_{5} $$, $$ \frac{\nabla \varepsilon}{2}\mathcal{G}_{s2}(z_{2})r_{2}^{-1}\nabla\sigma \leq k_{6} $$.

Consider the second and third term of (52)

(83)

$$ \begin{equation} \dot{L}_{22}+\dot{L}_{23}=tr\{e_{i2}^{\top}\dot{e}_{i2}\}+tr\{\tilde{Z}_{2}^{\top}\dot{\tilde{Z}}_{2}\} \end{equation} $$

On substituting the value of $$ \dot{e}_{i2} $$ and $$ \dot{\tilde{Z}}_{2} $$ from Theorem 1, one can write (83) as follows

(84)

$$ \begin{equation} \begin{split} \dot{L}_{22}+\dot{L}_{23}=tr\bigg\{e_{i2}^{\top}(-Ke_{i2}+\tilde{Z}_{2}\boldsymbol{\sigma(\xi)}\hat{\bar{u}}+\varepsilon_I(\zeta))+\tilde{Z}_{2}^{\top} \left (-\alpha_v \tilde{Z}_{2}- \boldsymbol{\sigma(\xi)}\hat{\bar{u}}e_{i2}^{\top}+\alpha_v Z_{2} \right )\bigg\}. \end{split} \end{equation} $$

which can be further simplified by using the cyclic property of traces as

(85)

$$ \begin{equation} \begin{split} \dot{L}_{22}+\dot{L}_{23}=tr\{e_{i2}^{\top}(-Ke_{i2}+\varepsilon_I(\zeta))+\tilde{Z}_{2}^{\top} \left (-\alpha_v \tilde{Z}_{2}+\alpha_v Z_{2} \right )\}. \end{split} \end{equation} $$

To simplify, we have $$ \varepsilon^{2}_{I}(z_{2}) \leq \|z_{2}\|^{2} $$. Therefore, on using Young's inequality in $$ e_{i2}z_{2} $$, we have

(86)

$$ \begin{equation} \dot{L}_{22}+\dot{L}_{23}\leq-K\|e_{i2}\|^{2}-\alpha_{v}\|\tilde{Z}_{2}\|^{2}+\|e_{i2}\|^{2}+\|z_{2}\|^{2}+\alpha_{v}\|\tilde{Z}^{\top}_{2}Z_{2}\| \end{equation} $$

Consider the fourth term of (52),

(87)

$$ \begin{equation} \begin{split} \dot{L}_{24}=tr\bigg\{\tilde{W}^{\top}_{c2}\bigg(\gamma_{c} \hat{\psi}_2(t) (-e_{2}(t)+\frac{1}{4}\hat{W}_{c2}^{\top}\nabla\sigma_{c2}\wedge\nabla\sigma_{c2}^{\top}\hat{W}_{c2}+\hat{W}^{\top}_{c2}\nabla\sigma_{c2} \mathcal{F}_{s2})+\sigma_{21}\hat{W}_{c2}\bigg)\bigg\} \end{split} \end{equation} $$

Therefore, one can further simplify (87) by using Young's inequality in cross-product terms as follows

(88)

$$ \begin{equation} \dot{L}_{24}\leq c_{1}\|\tilde{W}_{c2}\|^{2}+c_{2}\|\tilde{W}_{c2}\|+c_{3}\|\tilde{Z}_{2}\|^{2}+c_{4}\|\tilde{W}_{c2}\|-c_{5}\|\tilde{W}_{c2}\|^{2} \end{equation} $$

Consider the fifth term of (52) $$ \dot{L}_{25}=tr\{\tilde{W}_{a2}^{\top}\dot{W}_{a2}\} $$. On using the weight update law from Theorem 2, one can write

(89)

$$ \begin{equation} \begin{split} \dot{L}_{25}=tr\bigg\{\tilde{W}_{a2}^{\top}(\beta_{a}S_{3}(\beta_{1}z_{2}+\frac{1}{2}\hat{W}^{\top}_{a2}\sigma_{a2}\left( \bar{z}_{2}\right)-\frac{1}{2}r_{2}^{-1}\hat{\mathcal{G}}_{s2}^{\top}\nabla\sigma^{\top}_{c2}\hat{W}_{c2})+\sigma_{22}\hat{W}_{a2})\bigg\} \end{split} \end{equation} $$

Using $$ \tilde{W}_{c2}=W_{c2}-\hat{W}_{c2} $$, $$ \tilde{W}_{a2}=W_{a2}-\hat{W}_{a2} $$, $$ \tilde{\mathcal{G}}_{s2}=\mathcal{G}_{s2}-\hat{\mathcal{G}}_{s2} $$

(90)

$$ \begin{equation} \begin{split} \dot{L}_{25}=tr\bigg\{\tilde{W}_{a2}^{\top}(\beta_{a}S_{3}(\beta_{1}z_{2}+\frac{1}{2}({W}_{a2}-\tilde{W}_{a2})^{\top}\sigma_{a2}\left( \bar{z}_{2}\right)-\frac{1}{2}r_{2}^{-1}(\mathcal{G}_{s2}-\tilde{\mathcal{G}}_{s2})^{\top}\nabla\sigma^{\top}_{c2}({W}_{c2}-\tilde{W}_{c2})+\sigma_{22}\tilde{W}_{a2}(W_{a2}-\tilde{W}_{a2})\bigg\} \end{split} \end{equation} $$

Considering a last term, we can write

$$ \tilde{W}_{a2}(W_{a2}-\tilde{W}_{a2})\leq\|\tilde{W}_{a2}W_{a2}\|-\|\tilde{W}_{a2}\|^{2} $$

Using Young's inequality in the cross product terms, we can write

(91)

$$ \begin{equation} \begin{split} \dot{L}_{25}\leq c_{5}\|\tilde{W}_{a2}\|^{2}+c_{6}\|z_{2}\|^{2}+c_{7}\|\tilde{Z}_{2}\|^{2}+c_{8}\|W_{c2}\|^{2}+c_{9}\|\tilde{Z}_{2}\|+c_{10}\|\tilde{W}_{c2}\|+c_{21}\|\tilde{W}_{a2}\|-c_{22}\|\tilde{W}_{a2}\|^{2} \end{split} \end{equation} $$

Combining (82), (86), (88) and (91) and simplifying, we have

(92)

$$ \begin{equation} \begin{split} &\dot{L} \leq -c_{z1}(\|z_{1}\|-\frac{k_{11}}{c_{z1}})^{2}-c_{z2}(\|z_{2}\|-\frac{k_{21}}{c_{z2}})^{2}\\&-C_{c1}(\|\tilde{W}_{c1}\|-\frac{k_{12}}{C_{c1}})^{2}-C_{c2}(\|\tilde{W}_{c2}\|-\frac{k_{22}}{C_{c2}})^{2}\\&-C_{a1}(\|\tilde{W}_{a1}\|-\frac{k_{13}}{C_{a1}})^{2}-C_{a2}(\|\tilde{W}_{a2}\|-\frac{k_{23}}{C_{a2}})^{2}\\&-C_{f1}(\|\tilde{W}_{f1}\|-\frac{k_{f1}}{C_{f1}})^{2}-C_{f2}(\|\tilde{W}_{f2}\|-\frac{k_{f2}}{C_{f2}})^{2}+\bar{C}_{02}, \end{split} \end{equation} $$

where $$ \bar{C}_{02}=\frac{k_{11}^{2}}{c_{z1}}+\frac{k_{21}^{2}}{c_{z2}}+\frac{k_{12}^{2}}{C_{c1}}+\frac{k_{13}^{2}}{C_{a1}}+\frac{k_{23}^{2}}{C_{a2}}+\frac{k_{f1}^{2}}{C_{f1}}+\frac{k_{f2}^{2}}{C_{f2}} $$. Therefore, from (92), the bounds for $$ \|z_{k}\|, \|\tilde{W}_{ck}\| $$, $$ \|\tilde{W}_{ak}\|, \|\tilde{W}_{fk}\| $$ can be obtained as

(93)

$$ \begin{equation} \begin{cases} \begin{split} & \|z_{1}\|> {\sqrt{\frac{\bar{C}_{01}}{c_{z1}}}+\frac{k_{11}}{{c}_{zk}}} \quad or \|z_{2}\|> {\sqrt{\frac{\bar{C}_{02}}{c_{z2}}}+\frac{k_{21}}{{c}_{z2}}} or\\& \|\tilde{W}_{c1}\|>{\sqrt{\frac{\bar{C}_{01}}{C_{c1}}}+\frac{k_{12}}{{C}_{c1}}} or \|\tilde{W}_{c2}\|> {\sqrt{\frac{\bar{C}_{02}}{C_{c2}}}+\frac{k_{22}}{C_{c2}}} or\\& \|\tilde{W}_{a1}\|> {\sqrt{\frac{\bar{C}_{01}}{C_{a1}}}+\frac{k_{13}}{{C}_{a1}}} or \|\tilde{W}_{a2}\|>{\sqrt{\frac{\bar{C}_{02}}{C_{a2}}}+\frac{k_{23}}{{C}_{a2}}} \\& or \|\tilde{W}_{f1}\|>{\sqrt{\frac{\bar{C}_{01}}{C_{f1}}}+\frac{k_{f1}}{{C}_{f1}}} or \|\tilde{W}_{f2}\|>{\sqrt{\frac{\bar{C}_{02}}{C_{f2}}}+\frac{k_{f2}}{{C}_{f2}}}. \end{split} \end{cases} \end{equation} $$

This demonstrates that the overall closed-loop system is bounded. Since $$ \tilde{\mathcal{G}}_{sj} $$ is a function of the weight estimation error of the NN identifier, or $$ \tilde {Z}_{j} $$, and from (93), $$ \tilde{Z}_{j} $$ is bounded; as a consequence, $$ \hat {\mathcal{G}}_{sj} $$ is bounded. From (93) $$ \tilde{W}_{aj}, \tilde{W}_{cj} $$, identification error and system state are bounded. As a result, the control input error $$ u_{ej}=-\beta_{j}z_{j}-\frac{1}{2}\hat{W}_{aj}\sigma_{aj}\left( {z}_{j}\right)+\frac{1}{2}r_{j}^{-1}\mathcal{G}_{sj}\frac{d {J}_{j}}{d z_{j}} $$ becomes bounded with bound $$ e_{b} $$ which can be obtained by using the bounds from (93) and Assumption 4. Therefore, the actual control inputs are bounded close to their optimal values.

Proof of Theorem 2

The convergence of weights for Task 1 remains in alignment with Theorem 1. For Task 2, an additional term emerges in the Lyapunov proof (92) due to the regularization penalty, denoted as

(94)

$$ \begin{equation} \gamma_{\text{pen1}_{j}}= \tilde{W}_{j}(\hat{W}_{j}-{W_{A_{j}}}^{*}), \end{equation} $$

where $$ {W_{A_{j}}}^{*} $$ is indicative of the optimized weights for the primary task, as verified to be bounded in Theorem 1.

Substituting $$ \hat{W}_{j}=W^{*}_{j}-\tilde{W}_{j} $$, we can rearrange equation (94) to

(95)

$$ \begin{equation} \tilde{W}_{j}W^{*}_{j}-\tilde{W}_{j}^{2}-\tilde{W}_{j}{W_{A_{j}}}^{*} \leq \|\tilde{W}_{j}\|\|W^{*}_{j}\|-\|\tilde{W}_{j}\|^{2}+\|\tilde{W}_{j}W_{A_{j}}^{*}\|. \end{equation} $$

Employing Young's inequality to the first and third terms of (95), we get

$$ \begin{align*} \|\tilde{W}_{j}\|\|W^{*}_{j}\| &\leq \frac{\|\tilde{W}_{j}\|^{2}}{2}+\frac{\|W^{*}_{j}\|^{2}}{2}, \\ \|\tilde{W}_{j}W_{A_{j}}^{*}\| &\leq \frac{\|\tilde{W}_{j}\|^{2}}{2}+\frac{\|W_{A_{j}}^{*}\|^{2}}{2}. \end{align*} $$

Substituting $$ \|\tilde{W}_{j}\|\|W^{*}_{j}\|, \|\tilde{W}_{j}W_{A_{j}}^{*}\| $$ back into (95) provides

(96)

$$ \begin{equation} \tilde{W}_{j}W^{*}_{j}-\tilde{W}_{j}^{2}-\tilde{W}_{j}{W_{A_{j}}}^{*} \leq \frac{\|W_{A_{j}}^{*}\|^{2}}{2}+\frac{\|W^{*}_{j}\|^{2}}{2}. \end{equation} $$

Thus, the integration of this term into the proof solely modifies the error bound to $$ \gamma_{\text{pen}_{j}} \leq \frac{\|W^{*}_{j}\|^2}{2} + \frac{\|W_{A_{j}}^{*}\|^2}{2} $$, without impacting the overarching stability of the system.

The aggregate contribution to the error bounds is calculated by adding $$ \gamma_{\text{penj}_{j}} $$ to (92), resulting in a comprehensive error bound $$ C_{j}= \bar{C}_{0j}+\gamma_{\text{penj}_{j}} $$. It is, therefore, clear from the derived equations that the error bounds experience an increase as the weights deviate from their optimal points; however, the comprehensive stability of the system remains intact.

REFERENCES

1. Abu-Khalaf M, Lewis FL. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 2005;41:779-91.

2. McLain TW, Beard RW. Successive galerkin approximations to the nonlinear optimal control of an underwater robotic vehicle. In Proceedings of the1998 IEEE international conference on robotics and automation (Cat. No. 98CH36146). Leuven, Belgium. 20-20 May 1998.

3. Vrabie D, Pastravanu O, Abu-Khalaf M, Lewis FL. Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 2009;45:477-84.

4. Modares H, Lewis FL. Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica 2014;50:1780-92.

5. Gao W, Jiang ZP. Learning-Based adaptive optimal tracking control of strict-feedback nonlinear systems. IEEE Trans Neural Netw Learn Syst 2018;29:2614-24.

6. Zargarzadeh H, Dierks T, Jagannathan S. Optimal control of nonlinear continuous-time systems in strict-feedback form. IEEE Trans Neural Netw Learn Syst 2015;26:2535-49.

7. Huang Z, Bai W, Li T, et al. Adaptive reinforcement learning optimal tracking control for strict-feedback nonlinear systems with prescribed performance. Inf Sci 2023;621:407-23.

8. Wen G, Chen CLP, Ge SS. Simplified optimized backstepping control for a class of nonlinear strict-feedback systems with unknown dynamic functions. IEEE Trans Cybern 2021;51:4567-80.

9. Wen G, Hao W, Feng W, Gao K. Optimized backstepping tracking control using reinforcement learning for quadrotor unmanned aerial vehicle system. IEEE Trans Syst Man Cybern Syst 2022;52:5004-15.

10. Bryson AE. Applied optimal control: optimization, estimation and control. New York: Routledge; 1975. p. 496.

11. Wen G, Ge SS, Tu F. Optimized backstepping for tracking control of strict-feedback systems. IEEE Trans Neural Netw Learn Syst 2018;29:3850-62.

12. Wu J, Wang W, Ding S, Xie X, Yi Y. Adaptive neural optimized control for uncertain strict-feedback systems with unknown control directions and pre-set performance. Commun Nonlinear Sci Numer Simul 2023;126:107506.

13. Kirkpatrick J, Pascanu R, Rabinowitz NC, et al. Overcoming catastrophic forgetting in neural networks. Proc Natl Acad Sci USA 2017;114:3521-6.

14. Ganie I, Jagannathan S. Adaptive control of robotic manipulators using deep neural networks. IFAC-PapersOnLine 2022;55:148-53.

15. Kutalev A, Lapina A. Stabilizing elastic weight consolidation method in practical ML tasks and using weight importances for neural network pruning. ArXiv 2021. Available from: https://arxiv.org/abs/2109.10021 [Last accessed on 2 Feb 2024].

16. Liu Y, Zhu Q, Wen G. Adaptive tracking control for perturbed strict-feedback nonlinear systems based on optimized backstepping technique. IEEE Trans Neural Netw Learn Syst 2022;33:853-65.

17. Moghadam R, Jagannathan S. Optimal adaptive control of uncertain nonlinear continuous-time systems with input and state delays. IEEE Trans Neural Netw Learn Syst 2023;34:3195-204.

18. Mishra A, Ghosh S. Simultaneous identification and optimal tracking control of unknown continuous-time systems with actuator constraints. Int J Control 2022;95:2005-23.

Cite This Article

Research Article

Open Access

Continual online learning-based optimal tracking control of nonlinear strict-feedback systems: application to unmanned aerial vehicles

Irfan Ganie, Sarangapani Jagannathan

How to Cite

Download Citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click on download.

Export Citation File:

RIS BibTeX EndNote

Type of Import

Direct Import Indirect Import

Tips on Downloading Citation

This feature enables you to download the bibliographic information (also called citation data, header data, or metadata) for the articles on our site.

Citation Manager File Format

Use the radio buttons to choose how to format the bibliographic data you're harvesting. Several citation manager formats are available, including EndNote and BibTex.

Type of Import

If you have citation management software installed on your computer your Web browser should be able to import metadata directly into your reference database.

Direct Import: When the Direct Import option is selected (the default state), a dialogue box will give you the option to Save or Open the downloaded citation data. Choosing Open will either launch your citation manager or give you a choice of applications with which to use the metadata. The Save option saves the file locally for later use.

Indirect Import: When the Indirect Import option is selected, the metadata is displayed and may be copied and pasted as needed.

About This Article

Copyright

© The Author(s) 2024. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, sharing, adaptation, distribution and reproduction in any medium or format, for any purpose, even commercially, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Data & Comments

Data

Views

628

Downloads

540

Citations

4

Comments

0

6

Comments

Comments must be written in English. Spam, offensive content, impersonation, and private information will not be permitted. If any comment is reported and identified as inappropriate content by OAE staff, the comment will be removed without notice. If you have any queries or need any help, please contact us at [email protected].

⁰

Download PDF

Download XML 6 downloads

Cite This Article 11 clicks

Export Citation 5 clicks

Like This Article 6 likes

Share This Article

https://www.oaepublish.com/articles/ces.2023.35?to=comment

Scan the QR code for reading!

See Updates

Contents

Figures

Continual online learning-based optimal tracking control of nonlinear strict-feedback systems: application to unmanned aerial vehicles

Abstract

Keywords

1. INTRODUCTION

2. CONTINUAL LIFELONG OPTIMAL CONTROL FORMULATION

2.1. System description

2.2. Optimal backstepping control

2.3. NN identifier

2.4. Actor critic NN weight tuning

2.5. Continual lifelong learning

3. UAV TRACKING SIMULATION OUTCOMES

3.1. Unmanned aerial vehicle (UAV) problem formulation and control design

3.2. Simulation parameters and results

4. CONCLUSION AND DISCUSSION

DECLARATIONS

Authors' contributions

Availability of data and materials

Financial support and sponsorship

Conflicts of interest

Ethical approval and consent to participate

Consent for publication

Copyright

APPENDICES

Proof of Theorem 1

Proof of Theorem 2

REFERENCES

Cite This Article

How to Cite

Download Citation

Export Citation File:

Type of Import

Tips on Downloading Citation

Citation Manager File Format

Type of Import

About This Article

Copyright

Data & Comments

Data

Comments

Share This Article

See Updates

Committee on Publication Ethics

Portico

Committee on Publication Ethics

Portico