(Set-valued) dynamical systems (part 1)

In the past few months while doing my study I’ve a chance to work on a research related to dynamical systems. With a standard CS curriculum, probably most of students would not have heard of this or have to work on this kind of system during their study, at first I also didn’t care much about its direct relevancy to my study since my main goal at that point is to have some more time studying math. However, I can say at this point this turns out to be pretty fruitful and a great learning experience thanks to the great help from my supervisor and the progress that we’ve made.

In Feb 2026, from seemingly a toy study project at the beginning, I had a chance to present our work with some professors and researchers from Imperial College London, which made a genuine impression, and I thought it was a big deal and way above my expectations. Now, it’s even more exciting when this project begins the next phase, when we can work more closely with people from Imperial, collaborate, and take their inputs to build the very first useful version of a numerical tool for set-valued dynamical systems. We have so much to do and so many features to add, and it’s the first time in my mind that researchers would one day use our tool for their actual research (of course, only if we can keep the momentum). So, yeah, this is great, and I think it’s a good time for me to look back on what we have done, and let’s have a review of this learning experience. By learning experience, I mean I will present ideas and works from what I understand and try to make this intuitive and understandable as much as possible, but it’s often more convenient to read for people who already have some background. However, this is nowhere near replacing formal reading that you could read elsewhere, and I could also make mistakes, so be aware, and all the references I use during the writing will be listed at the end of the post.

In the first part of this, I will present some of the foundational theory and concepts that are needed when working with dynamical systems, as in my particular case, they are not something complete on their own, but rather what I have built up over time to get something done with our research project. In the second part, we will explore the numerical tool that we built, what features it has, and how it was built. Since it’s an active project, we might add something more later and also learn something new along the way, so there could be more follow-up posts in the future.

When hearing some new theory and when we don’t have a good idea on what this could mean, usually we will start the question by asking: “OK, what is its applications then?”. Turns out dynamical systems is the study that has many real-world applications, some of them include robotics, population dynamics, weather predictions, etc. A dynamical system is simply a rule that tells you how something changes over time, given the current state, the next state is completely determined by the fixed rule. You start the system with some initial state, apply the rule to get the new state, and then from this new state applies the rule again to get the next state and so on. The mathematical object is just a function [math]f[/math] that you iterate:

x_0 \to x_1 = f(x_0) \to x_2 = f(x_1) \to x_3 = f(x_2) \to ...

Since there is a fixed rule to transform one state to another, let’s say this rule is the function [math]f(x)=2x[/math]. Let’s say our initial state [math]x_0[/math] is 1 and then the next states will be 2, 4, 8 and so on. From this system, you get 2 by apply the map from the first state, you get 4 from applying the system from the first state and then the second state. So to get the state of the system after [math]n[/math] steps, this is the function composition of our fixed rule from the initial state [math]n[/math] times:

f(x_n) = f^n(x_0) = f \circ f \circ \cdots \circ f(x_0)\;\; \big(\text{$n$ times}\big)

Eventually from every dynamical system, we’re interested in long run behavior of the system. Where does this ends up? Will this settle down somewhere, will this oscillate, will this wander chaotically forever?

For example, a thermostat is a dynamical system and its state is the temperature, the rule is simple: if the current temperature is above the target, then the cooling kicks in, if the temperature is below the target, then the heating kicks in. The long term behavior of the system is trivial, the room temperature will converge to the target temperature and stays there. And this is called a fixed point attractor.

Another example is a pendulum, its state is a little bit richer since it includes an angle and the angular velocity. With friction, the pendulum will eventually stops at some equilibrium, so another fixed point. But without frictions, this then oscillates forever, so instead of stopping at some point and stay there forever, it will keep moving forever from one state to some other and then eventually repeat this process, and this is called a periodic orbit.

Another stability type that we want to give an example is when the system examines a chaotic behavior and this is fundamentally different from fixed points and periodic orbits. The system exhibits the chaotic behavior is sensitive to the initial condition, means that just a little change in the initial condition can lead to a very different outcome in the long term behavior. For example, a weather state could be constructed by many different factors: temperature, humidity, pressure, wind speed, etc and just a tiny change in one of the state today can lead to a very different outcome, let’s say, in 2 weeks.

Fixed and periodic orbits formulation

A fixed point is a point that maps to itself:

f(x^*) = x^*

A periodic orbit contains at least 2 different states where the system will return back to some state after some steps and it will keep repeating this process. For example, period [math]p[/math]-point satisfies:

f^p(x^*) = x^*

Start wit the state [math]x^*[/math] and returns to this [math]x^*[/math] after exact [math]p[/math] steps.

Local stability

However, finding fixed points and periodic orbits and then stop there is pretty boring, so a more interesting thing to do is to study the stability of the system nearby those fixed and periodic points, what happens if we just slightly change some value from the solution that we just found, will the nearby neighborhood of the solution remains stable or this will become unstable? And it’s often more realistic when consider the stability of the neighborhood rather than a single point since in we’re always slightly off due to measurement error, numerical precision, noise, etc.

So a fixed point is stable if trajectories starting sufficiently close to [math]x^*[/math] will remain close to [math]x^*[/math] forever. And a fixed point is asymptotically stable if the trajectories start sufficiently close to [math]x^*[/math] actually reaches [math]x^*[/math] as number of iterations go to [math]\infty[/math]. A fixed point is unstable if there exists trajectories arbitrarily close to [math]x^*[/math] and then eventually move away.

Two classical examples in real-world to demonstrate the local stability of the system is the ball in a bowl and ball on a hill. The first one is where you perturb it slightly and it returns back so it’s locally stable, however, the second one is locally unstable since we can just perturb the ball slightly and this will roll away forever and could never come back to the place where it begins.

That comes to the question what could be a mathematical tool that we can use to study this kind of stability. Nearby the fixed point [math]x^*[/math]. At least in systems with small dimensions, we can study the linear approximation for the neighborhood to figure out the local stability of the system. Near the fixed point [math]x^*[/math], the map [math]f[/math] looks approximately linear, it’s just idea of a tangent line from calculus — any smooth function (meaning it’s differentiable) if you zoom in enough in near a point, it will look like a straight line, in higher dimensions where you have more than one variable then this linear approximation becomes what’s called Jacobian matrix [math]Df(x^*)[/math] (basically it’s just the first-order partial derivative matrix).

One question might arise is if the system is non-linear and will linearization work and we can just ignore the higher-order terms to evaluate the local stability of the system. Turn out it will if the perturbation we have around the fixed point is small enough. We have the answer for the value of [math]f(x^*)[/math], what about finding the solution for some nearby point [math]f(x^* + \delta)[/math]? This is a perfect time we introduce the Taylor series expansion:

f(x^*+\delta)\approx f(x^*)+f'(x^*)\delta+\frac{1}{2}f''(x^*)\delta^2+\frac{1}{6}f'''(x^*)\delta^3+\cdots

Since we have [math]f(x^*) = x^*[/math], this then becomes:

f(x^*+\delta)\approx x^*+f'(x^*)\delta+\frac{1}{2}f''(x^*)\delta^2+\frac{1}{6}f'''(x^*)\delta^3+\cdots

The key insight to turn this into the linearization formula is that when the [math]\delta[/math] is really small, for example [math]\delta=0.01[/math], then [math]\delta^2 = 0.0001[/math], and [math]\delta^3 = 0.000001[/math]. Each higher order term is smaller than the previous one by a factor of [math]\delta[/math], so for small enough [math]\delta[/math], the first-order term [math]f'(x^*)\delta[/math] completely dominates dominate all others:

f(x^*+\delta)\approx x^*+f'(x^*)\delta+O(\delta^2)

where [math]\delta^2[/math] means terms that shrink as fast as [math]\delta^2[/math] or even faster. For the small [math]\delta[/math] they are negligible to the linear term. So near [math]x^*[/math], the map [math]f[/math] is well approximated by the linear function:

f(x^* + \delta) \approx x^* + f'(x^*)\delta

This works in any finite dimension as well, and in [math]\mathbb{R}^d[/math] the same argument applies but now [math]\delta[/math] is a vector [math]\in \mathbb{R}^d[/math] instead, and the linear approximation involves a matrix rather than a scalar. The Taylor series expansion in multiple dimensions then gives us:

f(x^* + \delta) \approx x^* + Df(x^*)\delta + O(||\delta^2||)

where [math]Df(x^*)[/math] is the Jacobian matrix — the matrix of first-order partial derivative evaluated at [math]x^*[/math], the term [math]Df(x^*)\delta[/math] is a matrix-vector product, and again it’s a linear operation on [math]\delta[/math], and the higher-order terms are negligible when [math]\delta[/math] is small (this is important, otherwise the linearization is not an appropriate solution for this).

We’re now closed to evaluate the local stability of the system, the final step is to analyze the eigenvalues of the system at some particular point, if we have a function [math]f: \mathbb{R}^d \to \mathbb{R}^d[/math] where d is the number of dimensions, then we always have [math]d-[/math]eigenvalues and the magnitude of those values can tell us whether the system will contract, expand or rotate.

When we look at a specific eigenvalue [math]\lambda[/math] associates with an eigenvector (direction), it will tell us how the distance from the fixed point change in that specific direction after one step.

If [math]|\lambda| < 1 [/math] then the small neighborhood around the fixed point will approach the fixed point, this is locally stable in this direction (Contraction)
If [math]|\lambda| = 1 [/math] then the distance stays the same, the small neighborhood around the fixed point is neither expanding or contracting (Neutral)
If [math]|\lambda| > 1 [/math] then the region of adjacent trajectories is stretched away from the fixed point, it’s locally unstable in this direction (Expanding)

To conclude about the local stability of the system, we need to combine information from all eigenvalues that we have found, at that time we need to find the Spectral Radius [math](\rho)[/math] which is the maximum absolute value among all eigenvalues:

\rho(J)=\max\{|\lambda_1|,|\lambda_2|,\ldots,|\lambda_d|\}

The system is stable if and only if [math]\rho(J) < 1 [/math] and it’s unstable if [math]\rho(J) > 1 [/math].

Local stability of Hénon map

Hénon map is a discrete-time dynamical system and it’s a classical example of dynamical systems that could exhibit the chaotic behavior. The Hénon map [math]f: R^2 \to R^2[/math] takes a point [math](x_n, y_n)[/math] and maps to the next point [math](x_{n+1}, y_{n+1})[/math] with this rule:

\begin{cases} x_{n+1}=1-a x_n^2 + y_n,\\ y_{n+1}=b x_n \end{cases}

Here the map depends on 2 parameters [math]a \text{ and } b[/math], and the map will exhibit the chaotic behavior when we have [math]a = 1.4 \text{ and } b = 0.3 [/math], recall that the local stability is determine by the spectral radius, and the map is locally unstable if at least one of the magnitude of the eigenvalues is larger than 1.

Also recall the stability that we are talking about here is the stability nearby the fixed point, and to find this we simply need to solve the function [math]f(x^*) – x^* = 0[/math]:

\begin{cases} 1 - ax^2+y - x = 0\\ bx - y = 0 \end{cases}

So we have [math]y=bx[/math], substitute this to the first equation, we have the quadratic equation to solve:

x=\frac{-(1-b)\pm\sqrt{(1-b)^2+4a}}{2a}

It has 2 real value solution at [math]a=1.4 \text{ and } b = 0.3[/math], and from the values of x-coordinate we can then find the values of y-coordinate and those 2 fixed points are:

x_1 \approx 0.631, y_1 \approx 0.189\\ x_2 \approx -1.131, y_2 \approx -0.339

We found the fixed points of the system, the next thing to do is to find out find out the linear approximation of nearby neighborhood, which is then to compute the Jacobian matrix of the system:

J = Df(x,y)= \begin{pmatrix} \frac{\partial f_1}{\partial x} & \frac{\partial f_1}{\partial y}\\ \frac{\partial f_2}{\partial x} & \frac{\partial f_2}{\partial y} \end{pmatrix} = \begin{pmatrix} -2ax & 1\\ b & 0 \end{pmatrix} = \begin{pmatrix} -2(1.4)x & 1 \\ 0.3 & 0 \end{pmatrix} = \begin{pmatrix} -2.8x & 1 \\ 0.3 & 0 \end{pmatrix}

Then from the characteristic equation [math]det(J – \lambda I) = 0[/math], solving it gives us 2 distinct real eigenvalues of the system with [math]\lambda_1 \approx 0.156[/math] and [math]\lambda_2 \approx -1.923[/math]. Since we have the [math]|\lambda_2| > 1[/math] so the system is indeed unstable and chaotic in this specific direction.

Stochastic dynamical systems

So far we’ve been working on with the deterministic dynamical system, when the current state and the mapping function are known then we can exactly determine the next state, there is no uncertainty there. In many real-world applications it’s OK to wipe out all insignificant details such as noise to just focus on the deterministic part, for example, when we want to model the trajectory of the cannonball, the spin of individual air molecule is irrelevant. So instead of model everything, we choose the model the right things only.

However there are some systems where discarding details is no longer insignificant where the noise is large enough, or the system is sensitive enough, if we treat them deterministically then we could have a wrong prediction in the long term behavior of the system.

This is a primary motivation for people to develop stochastic models and it sounds more realistic, rather than treating perturbation as something doesn’t exist, we then explicitly model randomness as a part of the system. The classical way to handle this is to add some random factor in each step:

x_{k+1}=f(x_k)+\sigma\xi_k

where [math]\{\xi_k\}_{k \in \mathbb{N}}[/math] are independent random variables drawn from some distribution — say, uniform on a ball, or Gaussian, and the [math]\sigma > 0[/math] controls the noise amplitude. So here in this model, it’s up to us to define which probability distribution that we’re going to use for a particular system. However, the Gaussian distribution is often used since the justification comes from the central limit theorem, where individually we can have many independent noise contributions — thermal vibrations, atmospheric fluctuations, rounding errors accumulated from many floating point operations. Since they are independent, each of them might have its own distinct probability distribution; we don’t know for sure, and probably cannot measure. But their sum, regardless of how the individual distribution looks, converges to a Gaussian as the number of contributions grows.

Then this brings us to the question of how we can study the long-term statistical behavior of the system when noise is a part of this. Here, the single trajectory is not the right object to study anymore, since in each step, there could be many different paths that the system can take next, so they will create different trajectories. The question now becomes: as [math]k \to \infty[/math], what is the probability of finding the system in each region of state space, regardless of which specific noise sequence occurred? The long-term behavior of the system is no longer a set, but rather the probability distribution — the stationary measure [math]\mu^*[/math].

The stationary measure [math]\mu^*[/math] is simply the distribution that does not change when you apply the transfer operator. After one step, it looks the same as before, after one million steps, still the same

\mathcal{P}\mu^* = \mu^*

The [math]{P}[/math] is the transfer operator, basically, what it does is: at time [math]k[/math] we have the probability distribution [math]\rho_k[/math] over the state space [math]X[/math], the transfer operator then combines the deterministic part of the system with the actual noise realization to to produce [math]\rho_{k+1}[/math] over the same state space [math]X[/math].

However, the Gaussian noise by default is unbounded when considering the case when we have an additive Gaussian noise, [math]\xi_k \sim \mathrm{N}(0, \sigma^2 I)[/math] where it lives in [math]R^d[/math], here the Gaussian distribution will assigns a positive density to every point in [math]R^d[/math], meaning that every noise realization is has non zero probability of occuring. The noise is diffusive — meaning over time, probability mass spreads throughout the entire state space rather than remaining confined to a compact region. However, in practice, it’s unrealistic in many cases.

Bounded noise and set-valued dynamical systems

We already see several problems when working with stochastic dynamical systems, and two major one is that we need to choose the probability distribution for the noise and the noise could be unbounded.

Bounded noise is a structurally different assumption; here, we have [math]||\xi_k \le \epsilon||[/math], we are asserting a hard limit on how large a perturbation can be. This is appropriate in many engineering contexts — an actuator can only push so hard, and a sensor error cannot exceed the measurement range. In these situations, the Gaussian model is not appropriate and could be misleading, because it can assign non-zero probability to perturbations that are physically impossible.

In a stochastic dynamical system, points that start in the same initial conditions could end up in very different paths. To make this crystal clear, for example, we have a point in [math]R^2[/math] [math](x_0, y_0) = (0.1, 0.2) [/math] we then map this forward and add some noise, the next point [math](x_1, y_1)[/math] becomes [math](0.4, 0.6)[/math]. Since the stationary measure is not constructed from just a single trajectory, we come back to the point [math](x_0, y_0)[/math] do the mapping again and some noise to get the value [math](x_1, y_1)[/math], this times the noise is different, and it becomes [math](0.44, 0.66)[/math] for example. We could repeat this process many times in each iteration, and essentially, this becomes something called set-valued dynamical systems, where instead of mapping a point to a point, we map a set of points to a set of points instead. We can define the bounded noise set-valued dynamical systems as follows:

Let’s [math]f: R^d \to R^d[/math] be a continuously differentiable map and let [math]\epsilon > 0[/math] be the noise bound. The [math]\epsilon\text{-inflation}[/math] of [math]f[/math] is the set-valued map [math]F_{\epsilon}: \mathbb{R}^d \Rightarrow \mathbb{R}^d[/math] be defined by:

F_\varepsilon(x) = \overline{B}_\varepsilon\!\bigl(f(x)\bigr) = \left\{\, f(x) + \xi \in \mathbb{R}^d : \|\xi\| \le \varepsilon \,\right\}

It’s the formula for discrete dynamical systems with bounded noise. We start from a single point [math]x[/math], map this forward, and then take every possible noise realization [math]\xi[/math] within the [math]\epsilon[/math] distance, and then add this to the deterministic part [math]f(x)[/math], then we will get a set of points. Since the maximum noise is [math]\epsilon[/math], if we’re in [math]R^2[/math], then this will create a ball with the radius of [math]\epsilon[/math], and if we’re in [math]R^3[/math] we expect a sphere with the same radius. Every noise realization that will be either inside the circle (if we’re in [math]R^2[/math]) or on the boundary of that circle, that is the hard boundary that Gaussian noise does not have. The [math]\overline{B}_\varepsilon(f(x))[/math] denotes the closed ball of radius [math]\epsilon[/math] centered at [math]f(x)[/math].

OK, we have a single point that maps to a set of points according to the formula that we presented above. Now, let’s assume we iterate once. At the end, this will give us a set of points as we already explained. The next step to really understand the long-term behavior of the system is to have a set of points, as we just found, and map them forward to get another set of points. Now this is called a discrete set-valued dynamical system with bounded noise. The action of [math]F_{epsilon}[/math] on the set [math]M \subseteq \mathbb{R}^d[/math] is defined as the union of its images over all points in [math]M[/math]:

F_\varepsilon(M) = \bigcup_{x \in M} \overline{B}_\varepsilon\!\bigl(f(x)\bigr) = \left\{\, f(x) + \xi : x \in M,\ \|\xi\| \le \varepsilon \,\right\}.

The input [math]M[/math] is the set of points. At each iteration, for each of the points in the set, we map them forward as we did with a single point to a set of points as before. And then at the end, we unionize their result together, so after one step, we have now the set of points and all possible noise realization of them:

M_0, M_1= F_{\epsilon}(M_0), M_2=F_{\epsilon}(M_1), ...,M_k=F^{k}_{\epsilon}(M_0)

Invariant Set and Minimal Invariant Set

Now we have some good background to learn about the next important concept in set-valued dynamical systems, which is the invariant set. We want to answer the questions: if we keep iterating the set-valued maps many and many times, what kind of structure would we expect at the end that could reveal the system behavior? If the structure remains the same as we keep iterating many and many times, then this is invariant. A compact set [math]M \subseteq R^d[/math] is invariant under [math]F_{\epsilon}[/math] if:

F_{\epsilon}(M) = M

This simultaneously says 2 things. First, it’s forward invariance: [math]F_{\epsilon}(M) \subseteq M[/math] means nothing escapes — every point reachable from [math]M[/math] in one step stays inside [math]M[/math], second, it’s backward invariance: [math]F_{\epsilon}(M)[/math] means every point in the current iteration has the preimage also belongs to [math]M[/math] in one step.

The problem with invariant sets is that they are not unique. There can be many of them, and they can be large and uninformative. In fact, if [math]M^*[/math] is invariant, then any larger compact set [math]M \supseteq M^*[/math] is also invariant, you can get a set of the whole space [math]R^d[/math], and it’s invariant by definition. To make this more useful, we now need to define an invariant set, but at the same time, it’s unique, and no smaller subset of it is invariant, and this is called a minimal invariant set (MIS). Minimal invariant set [math]M^*[/math] is an invariant set that contains no proper invariant subset, formally:

F_\varepsilon(M^*) = M^* \quad \text{and} \quad \not\exists\, A \subsetneq M^* \text{ compact with } F_\varepsilon(A)=A.

The minimality condition is what makes the [math]M^*[/math] the right object to study. It’s the smallest possible invariant set — we cannot throw away any part of it and still have invariance.

Boundary mappings of minimal invariant sets

The problem with set-valued dynamical systems is that it’s intractable to work with. The dimension in each new iteration grows exponentially, and there is no upper bound for the number of dimensions if we keep mapping. Suppose we started at a single point in [math]R^2[/math] [math](x, y)[/math] and do the mapping one time to get the set of points, let’s say we have a deterministic map [math]f: R^2 \to R^2[/math], and we represent the set [math]F_{\epsilon}(x)[/math] at each point [math]x[/math] by sampling N points uniformly from the ball [math]\overline{B}_{\epsilon}(f(x))[/math]. Starting from initial set [math]A_0[/math], after one iteration each point in [math]A_0[/math] produces [math]N[/math] new sample points, so the representation of [math]A_1 = F_{\epsilon}(A_0)[/math] contains at most [math]N \cdot |A_0|[/math] points. At the next iteration, each of those points again spawns [math]N[/math] new samples. After [math]k[/math] iterations, the number of sample points needed to represent [math]A_k = F^k_{\epsilon}(A_0)[/math] grows as:

N_k \le |A_0| \cdot N^k

With [math]N = 100[/math] samples per point and starting from a single point [math]|A_0|=1[/math], just after 3 iterations, we already have [math]100^3 = 1.000.000 [/math] sample points, which is completely unmanageable, both in memory and computational time.

The boundary mappings of set-valued dynamical systems have a more refined and manageable approach to work with directly. Instead of tracking the whole set of all possible noise realizations at one step, we only track the boundary of the set, using the boundary map function [math]E[/math], now we’re working with a purely deterministic dynamical system in the extended space [math]R^2 \times S^1[/math]. Here is how it’s presented in Wei’s thesis the boundary map function:

E(x,n)=\left( f(x)+\varepsilon\cdot\frac{(Df(x)^{-1})^{T}n}{\left\|(Df(x)^{-1})^{T}n\right\|}, \; \frac{(Df(x)^{-1})^{T}n}{\left\|(Df(x)^{-1})^{T}n\right\|} \right)

Basically, it takes a point with its outward normal and produces the next boundary point with its outward normal. There are many nice things about this:

There is no sampling anywhere in this computation.
There is no assumption of the probability distribution, and no random number is generated.
You take a point and map it forward, and take a new point deterministically.
The [math]\epsilon[/math] represents the worst case (bounded uncertainty) that the system would have to deal with in each step.
The long-term behavior of the system becomes tractable; instead of infinite-dimensional problems, it reduces the state space to a finite dimension.

For a detailed analysis of the boundary map, as well as how the minimal invariant sets can be obtained after a long evolution just by using the boundary map, please take a look at the reference paper.

I think I will end the post here; in the upcoming post, I will share about the numerical tool.

References:

5 1 vote

Article Rating

Facebook0 Tweet0 LinkedIn0

(Set-valued) dynamical systems (part 1)24 min read