Structure
Congruence between a formal model and a natural system, as defined in chapter 11 and further elaborated in chapter 12, means that a modelling formalism must enable us to draw useful and robust inferences about relevant aspects of a physical process. In the best case, this not only reveals a set of rules responsible for generating the observed dynamics, but also renders the behaviour of the process predictable (at least to some degree). In other words, a congruent model not only tells us something about the actual dynamics of a system, but also allows us to predict what kinds of behaviours may be possible in a given situation.
We’ve discussed the problem of identifying relevant features of a natural system (and how this always depends on the circumstances and aims of the modeller) in chapters 8, 9, 10, and 11. There is another difficulty though: in practice, it is hard (if not outright impossible) to predefine the total range of possible behaviours of most natural systems (see chapter 12). In fact, in a large world, it is not even clear what that would mean, and we argue in chapter 13 that we can define a complex system exactly as a system whose totality of possible behaviours cannot be captured by a single formal model. Our large and complex world is not entirely formalisable. This is perhaps the central insight of this book.
The art of modelling, then, is to match formalism (in the symbolic domain of the self) and target system (in the natural domain of the ambience) such that the model is congruent for our given purposes and captures all the possible system behaviours that are relevant in our specific situation. This is a more modest and realistic undertaking than striving for some unattainable theory of everything. It is an actionable project suited for limited human beings. But how do we achieve it? If we cannot delimit the entire space of possible behaviours of a natural system, how can we ever know that we have captured all relevant ones in a given situation? Remember: there could always be unconsidered alternatives. We can never be sure that we are not missing anything. Here, recognising the limitations of your modelling framework (rather than its positive potential) is of crucial importance. Are there aspects of what we want to understand that lie outside the capabilities of the model we are using? While delimiting the entire space of possibilities is generally impossible, this question can often be answered quite pragmatically.
Our situation may be serious, but it is definitely not hopeless: there are many limitations of our modelling frameworks that we do clearly recognise. And, more often than not, we can judge whether those limitations are hampering our ability to understand a given phenomenon, or a specific chain of events. We will provide plenty of examples in what follows. But the basic principle is simple: if the built-in constraints of your formalism make congruence impossible (or, at least, implausible), then you are not going to get much insight out of your model (let alone robust insight). Or worse, you are going to get a misleading illusion of understanding: you become captured by your map to a degree that makes it difficult for you to even still recognise the territory. This is what happens with the machine view of the world. We run the danger of bending the world to our ideas, instead of adapting our ideas to the world.
In practice, unfortunately, modellers often background or ignore this problem. Or they rationalise it away, out of convenience or (to be fair) practical necessity. Models are tools, after all. And when you have a hammer, everything suddenly starts looking like a nail. Some modelling frameworks are better developed, more widely applicable, and more tractable than others, but not always because they capture the phenomena better than their alternatives. Formalisms can become entrenched through habit for reasons that are accidental (or at least historically or socially contingent) rather than scientific. We use specific modelling tools simply because everyone around us uses them too. It takes a lot of time and effort to master these formalisms and their methods of analysis. This is why modellers often specialise on one particular approach, which they then apply to the widest possible variety of natural systems.
And so, it should come as no surprise that we often go to extraordinary lengths to justify the application of inadequate modelling formalisms to problems we know they do not capture properly. As a prominent example, think of the “dismal science” of economics, with its models assuming rational and purely self-interested human behaviour. What exactly such rational-choice models are modelling is not clear, but it surely is not how humans actually behave. Then again, it is really not legitimate to pick on any specific discipline: this kind of mismatch occurs across all areas of investigation, right down to the seemingly rigorous and principled domain of fundamental physics. In scientific investigation, idealisations are constantly made and then forgotten — or actively swept under the rug. Sadly, the art of modelling has a strong tendency to degenerate into the art of searching for your keys under the streetlight, rather than in the darkness, where you actually lost them.
How, then, can we judge whether a modelling framework supports the kind of explanations we are seeking? This is, of course, a philosophical problem at heart. Because, first, we need to be clear about what the explanations are that we want. What comprises a satisfactory answer to a given question heavily depends on our general circumstances, and the motivation for our investigation. Only as a second step can we then determine whether a specific modelling framework provides such a satisfactory answer.
Once again, we use a formal approach here to examine these two steps, and push this approach to its limits. We start by defining a system mathematically, in the most general way possible (similar to what we’ve done in chapter 11). We then add additional constraints (in the form of additional mathematical structure) to that general framework, which allow us to identify the exact conditions under which some modelling formalism applies. If the natural system we are studying does not meet these conditions, the framework is inadequate and won’t yield any truly congruent models. In principle, any modelling study (including the “models” that are laboratory experiments or controlled interventions in the wild, see chapter 9), should go through such an initial assessment of methodological validity. But, as we’ve said before: we often prefer to do what we know is feasible and yields familiar explanations — even if these turn out to be deficient in obvious ways — rather than face the limitations of our preferred approach head on and in an explicit manner.
To arrive at the most general definition of a system we can think of, we follow the general systems theory of Mesarovic and Takehara that we introduced in chapter 11. Here, the essence of “systemhood” is relational. The most basic formulation of a system S is as a set of variables V, which we can call the system objects — even though they are not necessarily static or determined, but can also represent dynamic or even stochastic processes — with abstract relations between them. These relations define the basic structure of a system. We can draw this as a graph of system objects Vᵢ , e.g.
Congruence between a formal model and a natural system, as defined in chapter 11 and further elaborated in chapter 12, means that a modelling formalism must enable us to draw useful and robust inferences about relevant aspects of a physical process. In the best case, this not only reveals a set of rules responsible for generating the observed dynamics, but also renders the behaviour of the process predictable (at least to some degree). In other words, a congruent model not only tells us something about the actual dynamics of a system, but also allows us to predict what kinds of behaviours may be possible in a given situation.
We’ve discussed the problem of identifying relevant features of a natural system (and how this always depends on the circumstances and aims of the modeller) in chapters 8, 9, 10, and 11. There is another difficulty though: in practice, it is hard (if not outright impossible) to predefine the total range of possible behaviours of most natural systems (see chapter 12). In fact, in a large world, it is not even clear what that would mean, and we argue in chapter 13 that we can define a complex system exactly as a system whose totality of possible behaviours cannot be captured by a single formal model. Our large and complex world is not entirely formalisable. This is perhaps the central insight of this book.
The art of modelling, then, is to match formalism (in the symbolic domain of the self) and target system (in the natural domain of the ambience) such that the model is congruent for our given purposes and captures all the possible system behaviours that are relevant in our specific situation. This is a more modest and realistic undertaking than striving for some unattainable theory of everything. It is an actionable project suited for limited human beings. But how do we achieve it? If we cannot delimit the entire space of possible behaviours of a natural system, how can we ever know that we have captured all relevant ones in a given situation? Remember: there could always be unconsidered alternatives. We can never be sure that we are not missing anything. Here, recognising the limitations of your modelling framework (rather than its positive potential) is of crucial importance. Are there aspects of what we want to understand that lie outside the capabilities of the model we are using? While delimiting the entire space of possibilities is generally impossible, this question can often be answered quite pragmatically.
Our situation may be serious, but it is definitely not hopeless: there are many limitations of our modelling frameworks that we do clearly recognise. And, more often than not, we can judge whether those limitations are hampering our ability to understand a given phenomenon, or a specific chain of events. We will provide plenty of examples in what follows. But the basic principle is simple: if the built-in constraints of your formalism make congruence impossible (or, at least, implausible), then you are not going to get much insight out of your model (let alone robust insight). Or worse, you are going to get a misleading illusion of understanding: you become captured by your map to a degree that makes it difficult for you to even still recognise the territory. This is what happens with the machine view of the world. We run the danger of bending the world to our ideas, instead of adapting our ideas to the world.
In practice, unfortunately, modellers often background or ignore this problem. Or they rationalise it away, out of convenience or (to be fair) practical necessity. Models are tools, after all. And when you have a hammer, everything suddenly starts looking like a nail. Some modelling frameworks are better developed, more widely applicable, and more tractable than others, but not always because they capture the phenomena better than their alternatives. Formalisms can become entrenched through habit for reasons that are accidental (or at least historically or socially contingent) rather than scientific. We use specific modelling tools simply because everyone around us uses them too. It takes a lot of time and effort to master these formalisms and their methods of analysis. This is why modellers often specialise on one particular approach, which they then apply to the widest possible variety of natural systems.
And so, it should come as no surprise that we often go to extraordinary lengths to justify the application of inadequate modelling formalisms to problems we know they do not capture properly. As a prominent example, think of the “dismal science” of economics, with its models assuming rational and purely self-interested human behaviour. What exactly such rational-choice models are modelling is not clear, but it surely is not how humans actually behave. Then again, it is really not legitimate to pick on any specific discipline: this kind of mismatch occurs across all areas of investigation, right down to the seemingly rigorous and principled domain of fundamental physics. In scientific investigation, idealisations are constantly made and then forgotten — or actively swept under the rug. Sadly, the art of modelling has a strong tendency to degenerate into the art of searching for your keys under the streetlight, rather than in the darkness, where you actually lost them.
How, then, can we judge whether a modelling framework supports the kind of explanations we are seeking? This is, of course, a philosophical problem at heart. Because, first, we need to be clear about what the explanations are that we want. What comprises a satisfactory answer to a given question heavily depends on our general circumstances, and the motivation for our investigation. Only as a second step can we then determine whether a specific modelling framework provides such a satisfactory answer.
Once again, we use a formal approach here to examine these two steps, and push this approach to its limits. We start by defining a system mathematically, in the most general way possible (similar to what we’ve done in chapter 11). We then add additional constraints (in the form of additional mathematical structure) to that general framework, which allow us to identify the exact conditions under which some modelling formalism applies. If the natural system we are studying does not meet these conditions, the framework is inadequate and won’t yield any truly congruent models. In principle, any modelling study (including the “models” that are laboratory experiments or controlled interventions in the wild, see chapter 9), should go through such an initial assessment of methodological validity. But, as we’ve said before: we often prefer to do what we know is feasible and yields familiar explanations — even if these turn out to be deficient in obvious ways — rather than face the limitations of our preferred approach head on and in an explicit manner.
To arrive at the most general definition of a system we can think of, we follow the general systems theory of Mesarovic and Takehara that we introduced in chapter 11. Here, the essence of “systemhood” is relational. The most basic formulation of a system S is as a set of variables V, which we can call the system objects — even though they are not necessarily static or determined, but can also represent dynamic or even stochastic processes — with abstract relations between them. These relations define the basic structure of a system. We can draw this as a graph of system objects Vᵢ , e.g.
which, in the formalism of set theory can be written as S ⊂ V₁ × V₂ × V₃ × V₄ × V₅ or, more generally
This complicated-looking formula simply says that the graph can contain any finite combination of pairings (i.e., relations) between an arbitrary number of system objects. This way, it can capture any possible system structure. In the mathematical discipline of graph theory, the Vᵢ are called nodes or vertices, and their connections are the edges of the graph. Note that each object Vᵢ is itself a set, since it represents a variable, which can take on many different values: the elements (of the alphabet) of the set.
In its simplest formulation, a system graph is undirected: there is no specific polarity to any of the edges. They all go both ways. The only system property that we capture here is that system objects must be (cor)related to each other in some way. Often, especially when such a graph is extracted from empirical data, connections are supported with distinct probabilities, which we can indicate by adding a weight to each edge (indicated by multiple connectors above). But then, the resulting weighted graph is no longer a truly minimal model of a system, as we’ve already added further mathematical structure.
This general framework represents natural systems as static networks. It is simple, that’s for sure, and very powerful, as we shall see. But it is also maximally abstracted from the underlying natural system. This makes network models extremely widely applicable, from the design of electric and electronic circuits, power grids, or computer networks in engineering, to statistical and particle physics, to the study of metabolism, gene regulatory networks, brains, ecosystems, and epidemics in biology and medicine, to inquiries into linguistic structures, networks of social interactions and cultural exchanges, technology development, financial markets, supply chains, and whole economies in the human realm.
The flipside of this extraordinary degree of abstraction is that static network models only capture a very limited range of properties of their target systems. To think that natural systems are networks is to commit the fallacy of misplaced concreteness: to mistake the abstract for the concrete. Network models only capture those properties that exclusively depend on the structure of a system, which includes many statistical features, such as robustness towards perturbations (e.g., the knock-out of random nodes, or the rerouting or interruption of random connections). This and other statistical properties rely on the distribution of the number of connections between nodes, which is called the network’s degree distribution, since it reveals the degree to which nodes are connected to each other.
It is relatively easy to extract degree distributions for network models from all kinds of empirical data. This is one of the main reasons why this formalism is so widely used. And it allows us to classify distinct network types. The nodes of a random network, for example, show connection degrees that follow distributions with a well-defined range, or mean and variance — such as a Gaussian or normal distribution. A scale-free network, in contrast, exhibits a power-law for its degrees, which means that there are a small number of very highly connected nodes (called hubs) that form a “fat tail” to the distribution. This does not happen in random networks, and it means there is no clearly defined range or average connectivity, which is why these networks are called “scale-free”.
In its simplest formulation, a system graph is undirected: there is no specific polarity to any of the edges. They all go both ways. The only system property that we capture here is that system objects must be (cor)related to each other in some way. Often, especially when such a graph is extracted from empirical data, connections are supported with distinct probabilities, which we can indicate by adding a weight to each edge (indicated by multiple connectors above). But then, the resulting weighted graph is no longer a truly minimal model of a system, as we’ve already added further mathematical structure.
This general framework represents natural systems as static networks. It is simple, that’s for sure, and very powerful, as we shall see. But it is also maximally abstracted from the underlying natural system. This makes network models extremely widely applicable, from the design of electric and electronic circuits, power grids, or computer networks in engineering, to statistical and particle physics, to the study of metabolism, gene regulatory networks, brains, ecosystems, and epidemics in biology and medicine, to inquiries into linguistic structures, networks of social interactions and cultural exchanges, technology development, financial markets, supply chains, and whole economies in the human realm.
The flipside of this extraordinary degree of abstraction is that static network models only capture a very limited range of properties of their target systems. To think that natural systems are networks is to commit the fallacy of misplaced concreteness: to mistake the abstract for the concrete. Network models only capture those properties that exclusively depend on the structure of a system, which includes many statistical features, such as robustness towards perturbations (e.g., the knock-out of random nodes, or the rerouting or interruption of random connections). This and other statistical properties rely on the distribution of the number of connections between nodes, which is called the network’s degree distribution, since it reveals the degree to which nodes are connected to each other.
It is relatively easy to extract degree distributions for network models from all kinds of empirical data. This is one of the main reasons why this formalism is so widely used. And it allows us to classify distinct network types. The nodes of a random network, for example, show connection degrees that follow distributions with a well-defined range, or mean and variance — such as a Gaussian or normal distribution. A scale-free network, in contrast, exhibits a power-law for its degrees, which means that there are a small number of very highly connected nodes (called hubs) that form a “fat tail” to the distribution. This does not happen in random networks, and it means there is no clearly defined range or average connectivity, which is why these networks are called “scale-free”.
Instead of looking at global statistical properties, such as degree distributions, we can also zoom in and characterise specific kinds of subgraphs, called network motifs, that occur with statistical frequencies deviating from random expectations in a given class of natural systems. Again, such motifs are relatively easy to extract from a wide range of empirical data. The aim of this approach is to obtain insights about the behaviour of a network by dissecting it into simple enough sub-networks such that dynamics are derivable from structure. In addition, network motifs are also supposed to help us understand the function, and hence the evolution of networks of, say, gene regulatory interactions or the exchange of ideas during technological innovation. We’ll come back to what we mean by “function” later in the book.
There is only one problem: static network-based modelling approaches are fundamentally inadequate for the purpose of studying system behaviour or evolution. We know that, but since they are easy to apply to the kind of data set that is straightforward to obtain, we use those approaches anyway to draw all kinds of unsupported conclusions. In fact, over the past decades, we observe a veritable network-mania in many branches of science — especially systems biology, neuroscience, and the social sciences. There are so many ways of not finding our keys because we are only looking under the streetlight!
A quick assessment of methodological validity reveals two fundamental limitations. The first one is really obvious: a static network model cannot say anything about the dynamics or behaviour of a natural system since it has abstracted away its temporal dimension. Simply knowing that some variables are statistically (cor)related does not tell you how exactly they interact causally. What is missing, on the one hand, is the directionality and strength of any individual connection, since we are using undirected graphs and statistical weights that do not directly represent interaction rates. How many times have we been told that correlation is not causation? In addition, dynamics depend on the wider context of a (sub)system — its interactions with the rest of the world. This applies to the smallest network motifs as well as to the global dynamics of a large and complicated system. Based on this, we must conclude that structure does not determine the dynamics, and hence also not the function, of a natural system. It’s as simple as that.
The second limitation is the following: natural systems rarely exhibit network structures that remain fixed over time. Connections can (and usually do) change as a system is altered through its interactions with the ambience (see chapter 12), meaning that statistical properties of a network can change (and often do so) as it evolves over time. Take the example of network robustness we introduced above: scale-free networks are known to be particularly robust against random perturbations, since most nodes that are taken out by chance will have a low degree of connectivity. However, if we systematically target the rare hubs of such a network, it not only loses its robustness quickly and catastrophically, but also transforms into something resembling a random network after just a few hubs have been knocked out. Robustness, after all, depends crucially on the circumstances under which network structure changes.
As another example, take the fact that we can extract power laws from the degree distributions of a very wide range of networks: the world wide web, metabolic and gene networks, or networks of social interactions, for example. Insofar as such distributions can be unambiguously determined at all (a rather significant practical problem, given noisy and/or incomplete data, and the similar shape of exponential and power-law distributions), they are widely used as evidence to argue for self-organization in network evolution and reconstitution after perturbation. Indeed, power laws are the hallmark of a dynamical regime (somewhat misleadingly) named the “edge of chaos” which, in turn, is indicative of a phenomenon called self-organised criticality. In addition, power laws can result from a specific mode of network growth called preferential attachment (a.k.a. “the rich get richer”), where highly connected nodes are more likely to accrue additional connections than less connected ones. This has led to all kinds of claims about the growth and evolution of networks, derived from the mere asserted presence of power laws in the corresponding network structure. But this is not robust inference, and therefore not good science!
As a matter of fact, power laws can emerge in all sorts of (rather boring) ways. For example, they arise in any system with a set of states that shrinks over time. This includes any natural system with a time-variable structure whose behaviour becomes increasingly constrained as time goes on. And yet, such time-variability of network structure is something that our static modelling framework explicitly excludes! Therefore, our analysis gives us the illusion of a specific insight when, in fact, it does not even take into account the simplest and most plausible kind of explanation because it is not within the model’s remit. This is a good example of how we can know with high confidence that a modelling framework is inadequate, but keep on using it anyway.
What we need for a causal understanding of system behaviour and evolution, instead, are models that explicitly deal with the temporal dimension, the dynamics of the system — which requires us to consider the directionality of relations, plus the strength and rate of interactions — but also to take into account the context of a natural system, and the noisiness of our data. In the remainder of this appendix, we’ll take a look at each one of these important aspects in turn, providing examples of formalisms that extend the static network picture in all these relevant ways while, at the same time, revealing their own limitations. This gives us a powerful toolkit for modelling natural systems, and a solid basis for developing a formalism suitable for understanding living systems, including ourselves, in the latter parts of the book.
There is only one problem: static network-based modelling approaches are fundamentally inadequate for the purpose of studying system behaviour or evolution. We know that, but since they are easy to apply to the kind of data set that is straightforward to obtain, we use those approaches anyway to draw all kinds of unsupported conclusions. In fact, over the past decades, we observe a veritable network-mania in many branches of science — especially systems biology, neuroscience, and the social sciences. There are so many ways of not finding our keys because we are only looking under the streetlight!
A quick assessment of methodological validity reveals two fundamental limitations. The first one is really obvious: a static network model cannot say anything about the dynamics or behaviour of a natural system since it has abstracted away its temporal dimension. Simply knowing that some variables are statistically (cor)related does not tell you how exactly they interact causally. What is missing, on the one hand, is the directionality and strength of any individual connection, since we are using undirected graphs and statistical weights that do not directly represent interaction rates. How many times have we been told that correlation is not causation? In addition, dynamics depend on the wider context of a (sub)system — its interactions with the rest of the world. This applies to the smallest network motifs as well as to the global dynamics of a large and complicated system. Based on this, we must conclude that structure does not determine the dynamics, and hence also not the function, of a natural system. It’s as simple as that.
The second limitation is the following: natural systems rarely exhibit network structures that remain fixed over time. Connections can (and usually do) change as a system is altered through its interactions with the ambience (see chapter 12), meaning that statistical properties of a network can change (and often do so) as it evolves over time. Take the example of network robustness we introduced above: scale-free networks are known to be particularly robust against random perturbations, since most nodes that are taken out by chance will have a low degree of connectivity. However, if we systematically target the rare hubs of such a network, it not only loses its robustness quickly and catastrophically, but also transforms into something resembling a random network after just a few hubs have been knocked out. Robustness, after all, depends crucially on the circumstances under which network structure changes.
As another example, take the fact that we can extract power laws from the degree distributions of a very wide range of networks: the world wide web, metabolic and gene networks, or networks of social interactions, for example. Insofar as such distributions can be unambiguously determined at all (a rather significant practical problem, given noisy and/or incomplete data, and the similar shape of exponential and power-law distributions), they are widely used as evidence to argue for self-organization in network evolution and reconstitution after perturbation. Indeed, power laws are the hallmark of a dynamical regime (somewhat misleadingly) named the “edge of chaos” which, in turn, is indicative of a phenomenon called self-organised criticality. In addition, power laws can result from a specific mode of network growth called preferential attachment (a.k.a. “the rich get richer”), where highly connected nodes are more likely to accrue additional connections than less connected ones. This has led to all kinds of claims about the growth and evolution of networks, derived from the mere asserted presence of power laws in the corresponding network structure. But this is not robust inference, and therefore not good science!
As a matter of fact, power laws can emerge in all sorts of (rather boring) ways. For example, they arise in any system with a set of states that shrinks over time. This includes any natural system with a time-variable structure whose behaviour becomes increasingly constrained as time goes on. And yet, such time-variability of network structure is something that our static modelling framework explicitly excludes! Therefore, our analysis gives us the illusion of a specific insight when, in fact, it does not even take into account the simplest and most plausible kind of explanation because it is not within the model’s remit. This is a good example of how we can know with high confidence that a modelling framework is inadequate, but keep on using it anyway.
What we need for a causal understanding of system behaviour and evolution, instead, are models that explicitly deal with the temporal dimension, the dynamics of the system — which requires us to consider the directionality of relations, plus the strength and rate of interactions — but also to take into account the context of a natural system, and the noisiness of our data. In the remainder of this appendix, we’ll take a look at each one of these important aspects in turn, providing examples of formalisms that extend the static network picture in all these relevant ways while, at the same time, revealing their own limitations. This gives us a powerful toolkit for modelling natural systems, and a solid basis for developing a formalism suitable for understanding living systems, including ourselves, in the latter parts of the book.
Time
In chapter 11, we define a formal system in a more specific way than the very general definition we’ve provided in the previous section. The system — represented by a set of states S, with each state s consisting of a number of variables Vᵢ — retains the internal structure of the undirected graph above. But we also add sets that represent external input X and system output Y, and relate them to each other (via set S) by a system-response mapping (or relation) of the form
In chapter 11, we define a formal system in a more specific way than the very general definition we’ve provided in the previous section. The system — represented by a set of states S, with each state s consisting of a number of variables Vᵢ — retains the internal structure of the undirected graph above. But we also add sets that represent external input X and system output Y, and relate them to each other (via set S) by a system-response mapping (or relation) of the form
Mesarovic and Takehara call this type of system an input-output system.
In chapter 12, we point out that a real-world system must also persist through time, at least long enough for us to pick it out of the ambience as a recognisable pattern. Therefore, the “timeless” systems we have defined so far are not quite adequate, since they lack any temporal dimension. Instead, a minimal model of a natural system that is actually realisable must be a general time system, with an added state-transition mapping, which describes how the system’s state is advanced from time t₁ to a later t₂:
In chapter 12, we point out that a real-world system must also persist through time, at least long enough for us to pick it out of the ambience as a recognisable pattern. Therefore, the “timeless” systems we have defined so far are not quite adequate, since they lack any temporal dimension. Instead, a minimal model of a natural system that is actually realisable must be a general time system, with an added state-transition mapping, which describes how the system’s state is advanced from time t₁ to a later t₂:
To be realisable, system-response and state-transition maps must be consistent with each other. If they are also independent of each other, then we have a dynamical system with a well-defined state space S that remains fixed over time (see chapter 12). This conforms to Newton’s trick of separating rules of change from state (see chapter 6). But if they are not independent, the time system is not completely formalisable, since its set of states evolves in unpredictable (even unprestatable) ways (see chapter 12 for details). We’ll revisit this problem in the next section.
By introducing a state-transition mapping, we have taken an important first step towards representing dynamics. But it is not quite enough. We also need to burden the network graph of the system with additional mathematical structure, if we want to have a usable modelling formalism to capture the behaviour of a time system. First of all, the edges of the graph must now have a direction, because they no longer represent simple (cor)relations, but causal relationships, which only go one way: from cause to effect. And second, these edges must have weights which represent the strength or rate of an interaction. Here is an example of such a weighted directed graph (or digraph, for short):
By introducing a state-transition mapping, we have taken an important first step towards representing dynamics. But it is not quite enough. We also need to burden the network graph of the system with additional mathematical structure, if we want to have a usable modelling formalism to capture the behaviour of a time system. First of all, the edges of the graph must now have a direction, because they no longer represent simple (cor)relations, but causal relationships, which only go one way: from cause to effect. And second, these edges must have weights which represent the strength or rate of an interaction. Here is an example of such a weighted directed graph (or digraph, for short):
Note that these weights are not at all the same as those of the undirected graph in the previous section, which represent our statistical confidence that a certain (cor)relation actually exists. The weights in the digraph, in contrast, are not statistical but kinetic parameters. They determine not the likelihood, but the time scale at which a causal interaction occurs. Take the example of a repressive interaction in a gene regulatory network: a transcription factor protein that binds to the regulatory sequence of a target gene and shuts down its expression. When we talk about the “strength” of this interaction, we really mean how fast it takes effect. Strong repression shuts off the target more or less immediately, while weak repression will only downregulate it slightly at first. Ultimately, the latter interaction may also shut down its target altogether, but in biological regulation, timing is of the essence. Fast or slow — it really matters, not least because rates of change influence the order in which events happen in a time system.
Static network models such as those introduced in the previous section are completely blind to such dynamic nuances. Thus, if the phenomenon you are studying depends on rates or temporal order (in any way), you cannot use such models to draw robust inferences. You see, it’s not exactly rocket science.
Fortunately, there are many formalisms available for modelling dynamical systems (even though it is usually harder and more laborious to obtain the kind of time-series data you need to validate them). Remember that we define this class of systems very broadly here, as time systems with independent system-response and state-transition maps. Not surprisingly, the most well-known, widespread, and sophisticated formalism for modelling such a system is called dynamical systems theory. But this is a bit misleading, because dynamical systems theory only deals with a tiny subclass of all possible dynamical systems (in the broad sense): those with continuous variables (both in terms of their values and their change over time), and a state space S that is, accordingly, a smooth topological space (see chapter 12).
In mathematical terms, this means that a dynamical system in the narrow sense of dynamical systems theory can be expressed as a system of differential equations. In the simplest case (when the system isn’t spatially distributed, and has no stochastic component), we get an array of ordinary differential equations (ODEs) that look like this
Static network models such as those introduced in the previous section are completely blind to such dynamic nuances. Thus, if the phenomenon you are studying depends on rates or temporal order (in any way), you cannot use such models to draw robust inferences. You see, it’s not exactly rocket science.
Fortunately, there are many formalisms available for modelling dynamical systems (even though it is usually harder and more laborious to obtain the kind of time-series data you need to validate them). Remember that we define this class of systems very broadly here, as time systems with independent system-response and state-transition maps. Not surprisingly, the most well-known, widespread, and sophisticated formalism for modelling such a system is called dynamical systems theory. But this is a bit misleading, because dynamical systems theory only deals with a tiny subclass of all possible dynamical systems (in the broad sense): those with continuous variables (both in terms of their values and their change over time), and a state space S that is, accordingly, a smooth topological space (see chapter 12).
In mathematical terms, this means that a dynamical system in the narrow sense of dynamical systems theory can be expressed as a system of differential equations. In the simplest case (when the system isn’t spatially distributed, and has no stochastic component), we get an array of ordinary differential equations (ODEs) that look like this
for the case of a system with n state variables xᵢ. Each of these equations describes the instantaneous rate of change (dxᵢ/dt) of one of the state variables. Here we have to be careful with mathematical notation: we no longer call the state variables Vᵢ (as in the previous section) but xᵢ to stick to the conventions of dynamical systems theory. Both refer to the same mathematical object: state variables are the components that constitute a state s. It’s a bit confusing: the xᵢ shown here have nothing to do with input X in the previous section. This is a decision we had to make: because we present an informal comparison between modelling formalisms, we’d rather stick to familiar mathematical conventions (which may differ between frameworks), than to confuse the reader with unfamiliar notation.
But let’s get back to the system of differential equations depicted above: more precisely speaking, change dxᵢ occurs over an infinitesimal time interval dt. We’ve talked about such infinitely small time spans in chapters 7 and 12. What’s important here is that the resulting instantaneous rate of change dxᵢ/dt depends — in a way characterised by some mapping f — on a combination of all the state variables xᵢ of the system. This is what the ODE above represents. Note that f can take many shapes. In most cases of dynamical systems that accurately describe natural ones, this mapping is nonlinear. The response function of a network interaction, for instance, usually follows a sigmoid curve, and many growth processes in nature exhibit exponential dynamics. Linearity mostly shows up in idealised models or rare (and often human-made) examples of systems that are exceptionally predictable in their behaviour.
The above formula looks complicated, but a system of differential equations is really just an abstract way of encoding the kind of weighted digraph we’ve shown above, assuming that the value of its nodes changes smoothly over time, and that f is parametrized to represent the strengths of interactions. Plus: we can easily add additional goodies, such as memory terms or time delays (if f depends on past states), or decay rates for state variables (if they represent vanishing forces or perishable substances). This is one reason why dynamical systems theory is such a versatile and powerful tool for the modeller.
All we need now to solve or simulate such a system of ODEs is a set of initial conditions (the system’s “box” — a special kind of boundary condition), which we get (in the simplest case, when the system isn’t spatially heterogeneous) by measuring observables that correspond to the starting values of the state variables. We can then track the flow of cause and effect through time starting from this initial state by integrating the equations, either analytically or numerically. Once we can do this, we can freely explore the space of the system’s possibilities by varying the initial conditions (or other parameter values). This is precisely Newton’s universal mathematical oracle: we can not only predict what the system will do given our actual measurements, but we can predict what it would do under all kinds of other, assumed, initial conditions at any time point in the future.
On top of all this, dynamical systems theory provides us with a powerful toolkit for finding and characterising the attractors of the system, which can be stable points, limit cycles, or complex manifolds (called strange attractors) to which the system will converge (or return, if we perturb it) as long as it remains in a local neighborhood of the state space called a basin of attraction. Such basins are separated by boundaries called separatrices. Once we cross such a separatrix, the system will find itself in a different basin of attraction, and it will converge to a different attractor.
This brings us to two of the most significant limitations of this approach: most analytic tools of dynamical systems theory rely on the linearisation of a system around its attractors (particularly, fixed-point attractors). Therefore, these tools are only really accurate or useful if we assume that the system spends most of its time at or near a steady state, i.e., if it quickly settles into a dynamic regime where its state variables no longer change over time. This is the case for many natural systems, such as networks of chemical reactions that tend to settle quickly towards thermodynamic equilibrium if left unperturbed. However, it decidedly does not apply to any living or evolutionary process (see chapter 12), nor (in general) to ecological and social systems that are built on such processes. All of these natural systems involve unpredictable structural changes and/or interactions with other systems that prevent them from ever getting near a steady state. We’ll have more to say about this in the last two parts of the book.
And, as mentioned earlier, not all dynamical systems conform to the continuity conditions of dynamical systems theory. A dynamical system can also be discrete, either in the values that its state variables can take on, or in the way it implements (stepwise) change (see chapter 12). Logical formalisms can be used to model dynamical systems that are discrete in both ways. A particularly simple example are Boolean networks, where each node represents an on/off switch, i.e., a variable that can only take on the values of 0 and 1, and where interactions between nodes are mediated by Boolean functions (NOT, AND, OR, and so on; see also this appendix). A big advantage of logical models is that it is often possible to enumerate their state spaces explicitly, which means we can study the behaviour of the system away from steady states, if we like. Such models were famously used by Stuart Kauffman, for example, to show how the “edge of chaos” regime (and other forms of dynamic and structural order) arise during network evolution.
This is another example where the simpler discrete modelling formalism seems easier to apply to a natural system than, let’s say, a continuous differential equation model. After all, a logical model only has to agree qualitatively with the behaviour of the target system to be congruent. Yet, as in the case of static network models, discrete formalisms are more highly abstracted, so we need to be cautious.
Let us return to the example of a gene regulatory network. In this case, the on/off states of our Boolean network model represent functional states of gene expression: the gene is either “active” or “inactive” in some biologically meaningful way. Evidently, this idea is semantically loaded. This is why functional states are generally not directly observable. In principle, we’d have to first establish that a threshold between discrete on/off states exists in the actual (measurable) concentration of a gene product. This is not a given, as biological processes generally involve gradual concentration changes. Again, the simplicity of the modelling framework deceives: a careful assessment of methodological validity is required to know whether we can actually apply such a formalism in any particular case.
There are many other formalisms to model dynamical systems (in the broad sense). We will not cover them in detail here, but you can look up cellular automata, dynamical Bayesian networks, and Petri nets (to name just a few). They all differ in the details of their implementation and the conditions that must apply to a natural system to be congruently modelled by them, but they all have one thing in common: they treat a dynamical system as a set of fixed (or regularly forced) rules that determine (or probabilistically constrain) the dynamics of a system from a given (measured) starting point within a given set of boundary conditions. This is quite a mouthful but, basically, what it means is that they are all just “physics in a box.” They require us to delimit a well-defined subset of the ambience as the state space of the dynamical system for the model to be applicable to its target natural system.
But let’s get back to the system of differential equations depicted above: more precisely speaking, change dxᵢ occurs over an infinitesimal time interval dt. We’ve talked about such infinitely small time spans in chapters 7 and 12. What’s important here is that the resulting instantaneous rate of change dxᵢ/dt depends — in a way characterised by some mapping f — on a combination of all the state variables xᵢ of the system. This is what the ODE above represents. Note that f can take many shapes. In most cases of dynamical systems that accurately describe natural ones, this mapping is nonlinear. The response function of a network interaction, for instance, usually follows a sigmoid curve, and many growth processes in nature exhibit exponential dynamics. Linearity mostly shows up in idealised models or rare (and often human-made) examples of systems that are exceptionally predictable in their behaviour.
The above formula looks complicated, but a system of differential equations is really just an abstract way of encoding the kind of weighted digraph we’ve shown above, assuming that the value of its nodes changes smoothly over time, and that f is parametrized to represent the strengths of interactions. Plus: we can easily add additional goodies, such as memory terms or time delays (if f depends on past states), or decay rates for state variables (if they represent vanishing forces or perishable substances). This is one reason why dynamical systems theory is such a versatile and powerful tool for the modeller.
All we need now to solve or simulate such a system of ODEs is a set of initial conditions (the system’s “box” — a special kind of boundary condition), which we get (in the simplest case, when the system isn’t spatially heterogeneous) by measuring observables that correspond to the starting values of the state variables. We can then track the flow of cause and effect through time starting from this initial state by integrating the equations, either analytically or numerically. Once we can do this, we can freely explore the space of the system’s possibilities by varying the initial conditions (or other parameter values). This is precisely Newton’s universal mathematical oracle: we can not only predict what the system will do given our actual measurements, but we can predict what it would do under all kinds of other, assumed, initial conditions at any time point in the future.
On top of all this, dynamical systems theory provides us with a powerful toolkit for finding and characterising the attractors of the system, which can be stable points, limit cycles, or complex manifolds (called strange attractors) to which the system will converge (or return, if we perturb it) as long as it remains in a local neighborhood of the state space called a basin of attraction. Such basins are separated by boundaries called separatrices. Once we cross such a separatrix, the system will find itself in a different basin of attraction, and it will converge to a different attractor.
This brings us to two of the most significant limitations of this approach: most analytic tools of dynamical systems theory rely on the linearisation of a system around its attractors (particularly, fixed-point attractors). Therefore, these tools are only really accurate or useful if we assume that the system spends most of its time at or near a steady state, i.e., if it quickly settles into a dynamic regime where its state variables no longer change over time. This is the case for many natural systems, such as networks of chemical reactions that tend to settle quickly towards thermodynamic equilibrium if left unperturbed. However, it decidedly does not apply to any living or evolutionary process (see chapter 12), nor (in general) to ecological and social systems that are built on such processes. All of these natural systems involve unpredictable structural changes and/or interactions with other systems that prevent them from ever getting near a steady state. We’ll have more to say about this in the last two parts of the book.
And, as mentioned earlier, not all dynamical systems conform to the continuity conditions of dynamical systems theory. A dynamical system can also be discrete, either in the values that its state variables can take on, or in the way it implements (stepwise) change (see chapter 12). Logical formalisms can be used to model dynamical systems that are discrete in both ways. A particularly simple example are Boolean networks, where each node represents an on/off switch, i.e., a variable that can only take on the values of 0 and 1, and where interactions between nodes are mediated by Boolean functions (NOT, AND, OR, and so on; see also this appendix). A big advantage of logical models is that it is often possible to enumerate their state spaces explicitly, which means we can study the behaviour of the system away from steady states, if we like. Such models were famously used by Stuart Kauffman, for example, to show how the “edge of chaos” regime (and other forms of dynamic and structural order) arise during network evolution.
This is another example where the simpler discrete modelling formalism seems easier to apply to a natural system than, let’s say, a continuous differential equation model. After all, a logical model only has to agree qualitatively with the behaviour of the target system to be congruent. Yet, as in the case of static network models, discrete formalisms are more highly abstracted, so we need to be cautious.
Let us return to the example of a gene regulatory network. In this case, the on/off states of our Boolean network model represent functional states of gene expression: the gene is either “active” or “inactive” in some biologically meaningful way. Evidently, this idea is semantically loaded. This is why functional states are generally not directly observable. In principle, we’d have to first establish that a threshold between discrete on/off states exists in the actual (measurable) concentration of a gene product. This is not a given, as biological processes generally involve gradual concentration changes. Again, the simplicity of the modelling framework deceives: a careful assessment of methodological validity is required to know whether we can actually apply such a formalism in any particular case.
There are many other formalisms to model dynamical systems (in the broad sense). We will not cover them in detail here, but you can look up cellular automata, dynamical Bayesian networks, and Petri nets (to name just a few). They all differ in the details of their implementation and the conditions that must apply to a natural system to be congruently modelled by them, but they all have one thing in common: they treat a dynamical system as a set of fixed (or regularly forced) rules that determine (or probabilistically constrain) the dynamics of a system from a given (measured) starting point within a given set of boundary conditions. This is quite a mouthful but, basically, what it means is that they are all just “physics in a box.” They require us to delimit a well-defined subset of the ambience as the state space of the dynamical system for the model to be applicable to its target natural system.
Boundaries
But what happens if the very rules that govern a process change, unpredictably, over time? In chapter 12, we introduced two classes of natural systems to which this applies in particular: (1) evolutionary processes, whose future spaces of possibilities cannot be prestated (formally defined as a specific subset of the ambience) due to the open-ended and indefinite nature of interactions between systems in a large world, and (2) living (autopoietic or self-manufacturing) systems, where the internal dynamics of the system become indeterminate because of their collectively impredicative nature, that is, the fact that their constituent (sub)systems all mutually construct each other. We’ll have a lot more to say about the latter in the two last parts of the book. Here, we will primarily focus on the former.
In chapters 10 and 12, we explained how the dynamics of such processes are governed by the temporal evolution of what are called nonholonomic, nonintegrable, or context-dependent constraints, i.e., by unpredictable changes in how (sub)systems interact with each other that then become the rules of change for the higher-level encompassing system. We’ll cover this topic in a lot more detail when we introduce the concept of organizational emergence in chapter 14. In terms of dynamical systems theory, we could say that changing boundary conditions (rather than the underlying rules of change) come to dominate system dynamics. The way this works is as follows: nonholonomic constraints (arising from interactions between processes) restrict the range of dynamical behaviours of the (sub)systems they act upon in a history- and situation-dependent way that is not predictable from the underlying general laws of motion. Or, to paraphrase Terrence Deacon: “a system built on context-dependent constraints is actually less than the sum of its parts!” In other words, the dynamic repertoire of the component processes in the context of the system is smaller than the range of behaviours they could exhibit if studied in isolation. But the way their behaviour becomes restricted is not captured by their underlying rules of change.
If we are interested in modelling such nonintegrable constraint-based systems, we need formalisms that can capture the evolution of nonholonomic constraints — despite their seemingly “lawless” nature. We learned in chapter 7, and discussed in more detail in chapter 12, that a lack of predictability does not necessarily preclude generative models that explain how such a system works (even though it does preclude prediction over large time scales). At the same time, such models cannot be based on traditional mathematical modelling approaches (such as those described in the previous two sections), which all assume a minimum degree of lawlikeness (or compressibility) in a system. In the absence of any such lawlikeness, our models will be of an irreducible computational nature (again, see chapter 7, and also chapter 4). This is what computer scientists call simulations. We use the term here in this traditional sense, not the more specific one of Rosen and Baudrillard, which denotes a formal system that focuses on prediction, without caring if it gets the underlying rules of change of the system right (see chapter 11).
In this section, we discuss how such computer simulations can and do simulate systems that cannot be modelled using the traditional mathematical formalisms we’ve discussed above. Yet, at the same time, even irreducible (i.e., computationally complex) simulations fail to resolve the most fundamental underlying problem: our limited ability to formalise evolutionary processes in the first place.
To be more precise: the problem we need to tackle here is whether or not we can formalise how the rules of change of the system themselves change over time. This problem has two complementary aspects: an internal one, where the rules of change (contra Newton) come to explicitly depend on the current state (and thus the history) of the system, and an external one, where the rules of change are affected by unpredictable interactions of the system with its environment in a large world, where the totality of all possible interactions cannot be formally captured and defined in advance.
Note that not all systems with time-variable structures are problematic in the way we’ve just described. We’ve already encountered forced dynamical systems in chapter 12. Their structure changes in a regular way that is independent of their internal dynamics. Therefore, they can be modelled in a traditional way, simply by including several hierarchical levels of dynamics within the boundaries of the model. But this is not where we are going here. Instead, we are interested in systems where the state-transition mapping gets modified by the current internal state of the system. To model this kind of dynamic, we need a formalism in which the rule change isn’t forced from outside, but is generated within the system.
In other words, we need a formalism with a set of rules that rewrites itself. It is difficult to imagine how to encode such a system with differential equations, say, or any other approach to modelling dynamical systems (with the potential exception of Bayesian approaches, which we revisit in the next section).
Computer scientists call this kind of formal system a rewrite system. And the most famous rewrite formalism is the λ calculus, invented in 1936 by none other than Alonzo Church. We will outline some of its principles in more detail in a later chapter. All we need to know for the moment is that rewrite systems (such as those encoded in λ) do have fixed rules, but they don’t govern the dynamics of a computational system directly. Instead, they tell us how to rewrite its rules of change in a consistent manner. This introduces a powerful kind of flexibility that traditional mathematical formalisms lack. It bears repeating: to achieve something similar within dynamical systems theory, we would have to formulate a set of differential equations that rewrite their own terms as the dynamics of the system evolve over time. Rewrite formalisms (such as λ) make this kind of task easy: for instance, they allow us to write computer programs that modify their own instructions! This is called metaprogramming — a functional programming paradigm in which the code of one program is treated as data for another (or, indeed, for itself). This opens new and exciting possibilities for modelling as well.
In particular, it allows us to encode models capable of reconfiguring their structure based on their own internal dynamics. This is undoubtedly useful when simulating living processes and their evolution, where fixed-rule formalisms lack the adaptability required to capture the unpredictable structure change driven by the dynamics of nonholonomic constraints. We’ll provide a detailed example of how this can be applied to the simulation of living dynamics in a later chapter. For now, let us focus on the more abstract question whether rewrite systems allow us to compute problems that are not computable by other means. Intuitively, we would expect this: shouldn’t the added flexibility of the formalism also lead to added computational power and reach?
And yet, somewhat surprisingly, this kind of self-reconfiguring system does not go beyond the traditional notion of computation. Remember: it was Turing himself — in 1936, the same year that Church invented λ — who proved that this formalism and his own universal Turing machine are equivalent in terms of the set of problems they compute. If it is computable by Turing, it is computable by Church, and the other way around. λ doesn’t solve any problems that a conventional computer program couldn’t solve as well. This means that rewrite systems can capture computational, but not organizational emergence (cf. chapters 12 and 14). Just like any other form of computation, they can be used to simulate irreducible processes but, in the end, they still require some set of rules to be fixed, albeit at a level of abstraction that is once removed from the instructions that actually govern system dynamics.
This leads us to a second aspect of our problem: the nonholonomic (“lawless”) nature of evolutionary processes does not stem from a lack of rules or order (as, say, in our discussion of quantum indeterminacy in chapters 4 and 7). In biological evolution, after all, we do have general regularities, such as natural selection applying across many different scales and contexts. Evolution, despite being misunderstood as a random process by many, is not random at all! Quite the contrary, and we’ve said this before: the lack of predictability stems from a radical open-endedness of interactions between lawlike processes in a large world. The general principles that lead to adaptation remain the same across contexts, and the laws governing the underlying physical processes are, of course, never altered by higher-level evolutionary dynamics. Instead, the particular interactions through which adaptation occurs (and which determine how exactly it occurs) happen in a radically context-dependent manner. That’s why Darwin’s theory works by describing statistical regularities at the level of the population, but not particular behaviours at the level of the individual, even if that individual is the basic unit of evolution (see this later chapter). And this is exactly why the theory of evolution explains so much, but predicts so few specific outcomes.
But how could we even begin to formalise such open-ended interactions of organisms and their environments? Well, one of the more sophisticated ways in which this is done in evolutionary computer simulations is through a so-called agent-based model. Such models have their own built-in epistemic cut: they are compartmentalised into “agents” (which are not really agents, as we shall see), and their environment (corresponding to what we call the ambience). “Agents” are subroutines of the model that navigate their environment according to a specific set of internal rules. These rules are parameterized by a “genome,” a set of control parameters whose values can mutate, thereby altering the rules by which the “agent” engages its environment. This is how “agents” learn and adapt. If such simulations are used for modelling evolution (rather than for epic movie battles, which may be their most impressive application), then there is also some kind of selection criterion at work, which determines the probability with which an agent “survives” (persists in memory) and, ultimately, “reproduces,” by copying its genome into a new agent that will go an engage its environment in its own way.
While such agent-based simulations have generated a number of interesting insights, they never achieve anything even remotely resembling open-ended evolution. This limitation is so pervasive, in fact, that it is known as the “failure of strong artificial life,” a research program that set out to implement actual evolutionary processes within the memory of a computer. It just doesn’t work. And the reason should be obvious to anyone reading this book by now: even though the “genome” of an agent can mutate, the way this changes its behaviour still occurs according to some pretty narrowly defined and ultimately fixed rules. The “agents” of agent-based models are not real agents, because they do not solve the problem of relevance, but operate within strict margins that are given by the modeller. Simply put, they cannot frame their own problems and this is why there is no true open-endedness in such simulations.
In summary, we do have formalisms that allow the rewriting of rules based on internal dynamics and environmental interactions. We’ve only shown two widespread examples here, but there are many more. Yet, all of these formalisms, without exception, remain strictly within a computational frame. Their world is small. They still behave according to fixed rules, even if these are applied obliquely. And, on top of all this, they are extremely difficult to analyse with any rigour or clarity. As our colleague Andrew Oates once put it: “doing systems science often amounts to replacing a natural system we don’t understand with a computational system we don’t understand.” And since evolutionary processes are fundamentally unpredictable anyway (at least over longer time spans), we gain precious little by applying these formalisms. In the end, the real obstacle we keep bumping into is and remains that we simply cannot formalise evolutionary processes in advance. Unexpected interactions will happen, sooner or later. And they cannot be captured by any well-defined formal system, no matter what its precise nature.
If this argument holds up, then there simply isn’t any formalism that would help us overcome this limitation. This may be as close as we will ever get to a universal law in biology: the organism and its evolution are not completely formalisable. It may be time then to stop claiming that we can, indeed, come up with a predictive theory of everything, and to accept our situation as limited beings in a large world. Our formal tools are what they are. And only by recognising their limitations can we use them effectively and safely. Only by knowing their proper domain of application can we know where they will yield robust and coherent results, and where they won’t. The world constantly evolves beyond our grasp. We can participate in this process, even rationalise it retroactively. But we cannot control or predict it. And we probably never will. And this is a good thing: the world will never cease to amaze and to surprise us.
But what happens if the very rules that govern a process change, unpredictably, over time? In chapter 12, we introduced two classes of natural systems to which this applies in particular: (1) evolutionary processes, whose future spaces of possibilities cannot be prestated (formally defined as a specific subset of the ambience) due to the open-ended and indefinite nature of interactions between systems in a large world, and (2) living (autopoietic or self-manufacturing) systems, where the internal dynamics of the system become indeterminate because of their collectively impredicative nature, that is, the fact that their constituent (sub)systems all mutually construct each other. We’ll have a lot more to say about the latter in the two last parts of the book. Here, we will primarily focus on the former.
In chapters 10 and 12, we explained how the dynamics of such processes are governed by the temporal evolution of what are called nonholonomic, nonintegrable, or context-dependent constraints, i.e., by unpredictable changes in how (sub)systems interact with each other that then become the rules of change for the higher-level encompassing system. We’ll cover this topic in a lot more detail when we introduce the concept of organizational emergence in chapter 14. In terms of dynamical systems theory, we could say that changing boundary conditions (rather than the underlying rules of change) come to dominate system dynamics. The way this works is as follows: nonholonomic constraints (arising from interactions between processes) restrict the range of dynamical behaviours of the (sub)systems they act upon in a history- and situation-dependent way that is not predictable from the underlying general laws of motion. Or, to paraphrase Terrence Deacon: “a system built on context-dependent constraints is actually less than the sum of its parts!” In other words, the dynamic repertoire of the component processes in the context of the system is smaller than the range of behaviours they could exhibit if studied in isolation. But the way their behaviour becomes restricted is not captured by their underlying rules of change.
If we are interested in modelling such nonintegrable constraint-based systems, we need formalisms that can capture the evolution of nonholonomic constraints — despite their seemingly “lawless” nature. We learned in chapter 7, and discussed in more detail in chapter 12, that a lack of predictability does not necessarily preclude generative models that explain how such a system works (even though it does preclude prediction over large time scales). At the same time, such models cannot be based on traditional mathematical modelling approaches (such as those described in the previous two sections), which all assume a minimum degree of lawlikeness (or compressibility) in a system. In the absence of any such lawlikeness, our models will be of an irreducible computational nature (again, see chapter 7, and also chapter 4). This is what computer scientists call simulations. We use the term here in this traditional sense, not the more specific one of Rosen and Baudrillard, which denotes a formal system that focuses on prediction, without caring if it gets the underlying rules of change of the system right (see chapter 11).
In this section, we discuss how such computer simulations can and do simulate systems that cannot be modelled using the traditional mathematical formalisms we’ve discussed above. Yet, at the same time, even irreducible (i.e., computationally complex) simulations fail to resolve the most fundamental underlying problem: our limited ability to formalise evolutionary processes in the first place.
To be more precise: the problem we need to tackle here is whether or not we can formalise how the rules of change of the system themselves change over time. This problem has two complementary aspects: an internal one, where the rules of change (contra Newton) come to explicitly depend on the current state (and thus the history) of the system, and an external one, where the rules of change are affected by unpredictable interactions of the system with its environment in a large world, where the totality of all possible interactions cannot be formally captured and defined in advance.
Note that not all systems with time-variable structures are problematic in the way we’ve just described. We’ve already encountered forced dynamical systems in chapter 12. Their structure changes in a regular way that is independent of their internal dynamics. Therefore, they can be modelled in a traditional way, simply by including several hierarchical levels of dynamics within the boundaries of the model. But this is not where we are going here. Instead, we are interested in systems where the state-transition mapping gets modified by the current internal state of the system. To model this kind of dynamic, we need a formalism in which the rule change isn’t forced from outside, but is generated within the system.
In other words, we need a formalism with a set of rules that rewrites itself. It is difficult to imagine how to encode such a system with differential equations, say, or any other approach to modelling dynamical systems (with the potential exception of Bayesian approaches, which we revisit in the next section).
Computer scientists call this kind of formal system a rewrite system. And the most famous rewrite formalism is the λ calculus, invented in 1936 by none other than Alonzo Church. We will outline some of its principles in more detail in a later chapter. All we need to know for the moment is that rewrite systems (such as those encoded in λ) do have fixed rules, but they don’t govern the dynamics of a computational system directly. Instead, they tell us how to rewrite its rules of change in a consistent manner. This introduces a powerful kind of flexibility that traditional mathematical formalisms lack. It bears repeating: to achieve something similar within dynamical systems theory, we would have to formulate a set of differential equations that rewrite their own terms as the dynamics of the system evolve over time. Rewrite formalisms (such as λ) make this kind of task easy: for instance, they allow us to write computer programs that modify their own instructions! This is called metaprogramming — a functional programming paradigm in which the code of one program is treated as data for another (or, indeed, for itself). This opens new and exciting possibilities for modelling as well.
In particular, it allows us to encode models capable of reconfiguring their structure based on their own internal dynamics. This is undoubtedly useful when simulating living processes and their evolution, where fixed-rule formalisms lack the adaptability required to capture the unpredictable structure change driven by the dynamics of nonholonomic constraints. We’ll provide a detailed example of how this can be applied to the simulation of living dynamics in a later chapter. For now, let us focus on the more abstract question whether rewrite systems allow us to compute problems that are not computable by other means. Intuitively, we would expect this: shouldn’t the added flexibility of the formalism also lead to added computational power and reach?
And yet, somewhat surprisingly, this kind of self-reconfiguring system does not go beyond the traditional notion of computation. Remember: it was Turing himself — in 1936, the same year that Church invented λ — who proved that this formalism and his own universal Turing machine are equivalent in terms of the set of problems they compute. If it is computable by Turing, it is computable by Church, and the other way around. λ doesn’t solve any problems that a conventional computer program couldn’t solve as well. This means that rewrite systems can capture computational, but not organizational emergence (cf. chapters 12 and 14). Just like any other form of computation, they can be used to simulate irreducible processes but, in the end, they still require some set of rules to be fixed, albeit at a level of abstraction that is once removed from the instructions that actually govern system dynamics.
This leads us to a second aspect of our problem: the nonholonomic (“lawless”) nature of evolutionary processes does not stem from a lack of rules or order (as, say, in our discussion of quantum indeterminacy in chapters 4 and 7). In biological evolution, after all, we do have general regularities, such as natural selection applying across many different scales and contexts. Evolution, despite being misunderstood as a random process by many, is not random at all! Quite the contrary, and we’ve said this before: the lack of predictability stems from a radical open-endedness of interactions between lawlike processes in a large world. The general principles that lead to adaptation remain the same across contexts, and the laws governing the underlying physical processes are, of course, never altered by higher-level evolutionary dynamics. Instead, the particular interactions through which adaptation occurs (and which determine how exactly it occurs) happen in a radically context-dependent manner. That’s why Darwin’s theory works by describing statistical regularities at the level of the population, but not particular behaviours at the level of the individual, even if that individual is the basic unit of evolution (see this later chapter). And this is exactly why the theory of evolution explains so much, but predicts so few specific outcomes.
But how could we even begin to formalise such open-ended interactions of organisms and their environments? Well, one of the more sophisticated ways in which this is done in evolutionary computer simulations is through a so-called agent-based model. Such models have their own built-in epistemic cut: they are compartmentalised into “agents” (which are not really agents, as we shall see), and their environment (corresponding to what we call the ambience). “Agents” are subroutines of the model that navigate their environment according to a specific set of internal rules. These rules are parameterized by a “genome,” a set of control parameters whose values can mutate, thereby altering the rules by which the “agent” engages its environment. This is how “agents” learn and adapt. If such simulations are used for modelling evolution (rather than for epic movie battles, which may be their most impressive application), then there is also some kind of selection criterion at work, which determines the probability with which an agent “survives” (persists in memory) and, ultimately, “reproduces,” by copying its genome into a new agent that will go an engage its environment in its own way.
While such agent-based simulations have generated a number of interesting insights, they never achieve anything even remotely resembling open-ended evolution. This limitation is so pervasive, in fact, that it is known as the “failure of strong artificial life,” a research program that set out to implement actual evolutionary processes within the memory of a computer. It just doesn’t work. And the reason should be obvious to anyone reading this book by now: even though the “genome” of an agent can mutate, the way this changes its behaviour still occurs according to some pretty narrowly defined and ultimately fixed rules. The “agents” of agent-based models are not real agents, because they do not solve the problem of relevance, but operate within strict margins that are given by the modeller. Simply put, they cannot frame their own problems and this is why there is no true open-endedness in such simulations.
In summary, we do have formalisms that allow the rewriting of rules based on internal dynamics and environmental interactions. We’ve only shown two widespread examples here, but there are many more. Yet, all of these formalisms, without exception, remain strictly within a computational frame. Their world is small. They still behave according to fixed rules, even if these are applied obliquely. And, on top of all this, they are extremely difficult to analyse with any rigour or clarity. As our colleague Andrew Oates once put it: “doing systems science often amounts to replacing a natural system we don’t understand with a computational system we don’t understand.” And since evolutionary processes are fundamentally unpredictable anyway (at least over longer time spans), we gain precious little by applying these formalisms. In the end, the real obstacle we keep bumping into is and remains that we simply cannot formalise evolutionary processes in advance. Unexpected interactions will happen, sooner or later. And they cannot be captured by any well-defined formal system, no matter what its precise nature.
If this argument holds up, then there simply isn’t any formalism that would help us overcome this limitation. This may be as close as we will ever get to a universal law in biology: the organism and its evolution are not completely formalisable. It may be time then to stop claiming that we can, indeed, come up with a predictive theory of everything, and to accept our situation as limited beings in a large world. Our formal tools are what they are. And only by recognising their limitations can we use them effectively and safely. Only by knowing their proper domain of application can we know where they will yield robust and coherent results, and where they won’t. The world constantly evolves beyond our grasp. We can participate in this process, even rationalise it retroactively. But we cannot control or predict it. And we probably never will. And this is a good thing: the world will never cease to amaze and to surprise us.
Stochasticity
So far, we have discussed limitations of modelling formalisms that concern the dynamics of a natural system, and the open-ended (or indefinite) nature of these dynamics. Such limitations are fundamental: a formalism will only give us robust or useful insights into the system if it is encoded in a way that captures these (open-ended) dynamics. The last limitation we will have to tackle in this appendix is equally fundamental: it stems from the fact that we almost never have complete data on any natural system and, in most cases, do not have the faintest clue what having “complete data” would even mean.
This generates two problems: the first is that we usually have to infer the state space of a system from partial evidence, and the second is that it becomes difficult to judge which aspects of a system’s behaviour are relevant in any given situation, and which aspects we classify and discard as “noise” or, more neutrally formulated, fluctuations that perturb and obscure the “typical” dynamics of the system.
Let’s begin with the first issue, and take a step back to the beginning of chapter 11. We can define the skillful art of modelling that is science in the following way: given our evidence, which consists of a limited sample of observations or measurements on a selected set of observables, we want to infer an appropriate underlying set of states that defines a dynamical system (in the broad sense of chapter 12).
We then use this dynamical system to characterise transitions between states in terms of state variables and their interrelations. This is how we come to better understand the system and, if we’re lucky, predict some of its behaviour. The problem is that, more often than not, this structure of the dynamical system is underdetermined: there are multiple variant models that are all equally compatible with the evidence, and our preference among those hypothesised models may change as additional observations or measurements are made. In addition, as we already mentioned above, there is the issue of unconsidered alternatives. In sum: the set of hypotheses we have available when modelling a natural system tends to be surprisingly hard to narrow down and, at the same time, it may be woefully incomplete to begin with.
Framed in more philosophical terms, science does not work through induction, as Francis Bacon famously thought it did: scientists do not simply generalise from limited empirical evidence to an unambiguous formal model, theory, or law. Instead, we proceed through what pragmatist philosopher Charles Sanders Peirce called abduction, or inference to the best explanation: we take a generally limited and biased selection of preconceived hypotheses and check which of these fits the current evidence best, given our particular situation. Given that our dinner is missing from the fridge, do we assume that mice ate it, or was it our flatmate? Depending on our living situation, one alternative may be more likely than the other. Much less plausible though is the idea that the food spontaneously quantum-teleported to someone else’s fridge. This almost never happens in our macroscale world of human experience. But there may be other, yet unconsidered, reasons. We may be missing an obvious point. This is, basically, how science progresses as well. It’s all rather improvised, but it works quite okay in most cases!
What this does imply, however, is that we seem to be caught in yet another paradox: in order to encode a formal model, we must already have some idea of what that model could be. Otherwise, we cannot connect our observables to states and their relations. And, again, we arrive at the central insight of our model of science as a skillful modelling practice: science is a neverending process. It provides us with tools to evolve our knowledge in the most robustly adaptive way possible, in order to generate trustworthy and useful understanding for coherent action. Therefore, the kind of circularity we have just described is not necessarily vicious: we build our models on previous models. That’s the whole point. It’s models all the way back — right to the first living cell, and the common ancestor of all the life that we know of. We told you before that science is continuous with how all living beings come to know the world.
This kind of philosophy forms the basis of a probabilistic and evolutionary view of science that is Bayesian, i.e., grounded on Bayes’ theorem, which is a really interesting piece of statistical theory:
So far, we have discussed limitations of modelling formalisms that concern the dynamics of a natural system, and the open-ended (or indefinite) nature of these dynamics. Such limitations are fundamental: a formalism will only give us robust or useful insights into the system if it is encoded in a way that captures these (open-ended) dynamics. The last limitation we will have to tackle in this appendix is equally fundamental: it stems from the fact that we almost never have complete data on any natural system and, in most cases, do not have the faintest clue what having “complete data” would even mean.
This generates two problems: the first is that we usually have to infer the state space of a system from partial evidence, and the second is that it becomes difficult to judge which aspects of a system’s behaviour are relevant in any given situation, and which aspects we classify and discard as “noise” or, more neutrally formulated, fluctuations that perturb and obscure the “typical” dynamics of the system.
Let’s begin with the first issue, and take a step back to the beginning of chapter 11. We can define the skillful art of modelling that is science in the following way: given our evidence, which consists of a limited sample of observations or measurements on a selected set of observables, we want to infer an appropriate underlying set of states that defines a dynamical system (in the broad sense of chapter 12).
We then use this dynamical system to characterise transitions between states in terms of state variables and their interrelations. This is how we come to better understand the system and, if we’re lucky, predict some of its behaviour. The problem is that, more often than not, this structure of the dynamical system is underdetermined: there are multiple variant models that are all equally compatible with the evidence, and our preference among those hypothesised models may change as additional observations or measurements are made. In addition, as we already mentioned above, there is the issue of unconsidered alternatives. In sum: the set of hypotheses we have available when modelling a natural system tends to be surprisingly hard to narrow down and, at the same time, it may be woefully incomplete to begin with.
Framed in more philosophical terms, science does not work through induction, as Francis Bacon famously thought it did: scientists do not simply generalise from limited empirical evidence to an unambiguous formal model, theory, or law. Instead, we proceed through what pragmatist philosopher Charles Sanders Peirce called abduction, or inference to the best explanation: we take a generally limited and biased selection of preconceived hypotheses and check which of these fits the current evidence best, given our particular situation. Given that our dinner is missing from the fridge, do we assume that mice ate it, or was it our flatmate? Depending on our living situation, one alternative may be more likely than the other. Much less plausible though is the idea that the food spontaneously quantum-teleported to someone else’s fridge. This almost never happens in our macroscale world of human experience. But there may be other, yet unconsidered, reasons. We may be missing an obvious point. This is, basically, how science progresses as well. It’s all rather improvised, but it works quite okay in most cases!
What this does imply, however, is that we seem to be caught in yet another paradox: in order to encode a formal model, we must already have some idea of what that model could be. Otherwise, we cannot connect our observables to states and their relations. And, again, we arrive at the central insight of our model of science as a skillful modelling practice: science is a neverending process. It provides us with tools to evolve our knowledge in the most robustly adaptive way possible, in order to generate trustworthy and useful understanding for coherent action. Therefore, the kind of circularity we have just described is not necessarily vicious: we build our models on previous models. That’s the whole point. It’s models all the way back — right to the first living cell, and the common ancestor of all the life that we know of. We told you before that science is continuous with how all living beings come to know the world.
This kind of philosophy forms the basis of a probabilistic and evolutionary view of science that is Bayesian, i.e., grounded on Bayes’ theorem, which is a really interesting piece of statistical theory:
Bayes’ rule, as it is also sometimes called, describes the conditional probability P(H|E) of some hypothesis H being true given certain evidence E. For instance, the probability of “it rained” (H) given that “the street is wet” (E). Now, the street could also be wet because last week’s snow melted, or because of a pipe leak in the neighborhood, or because the street cleaners just passed. Or some other explanation we have not even thought of. Be that as it may, the interesting aspect of Bayes’ rule is how it gets to this conditional probability, which is also called the posterior probability in Bayesian statistics, because it is calculated based on our prior beliefs about something, plus new evidence that may (or may not) affect our trust in that prior belief.
Interpreted in these terms, the formula above becomes quite intuitive: P(H) is the prior probability, denoting the probability of hypothesis H (“it rained”) before we observed the evidence E (“the street is wet”). Such prior belief could be informed by general knowledge about the local climate (“does it rain often here?”), or it could be completely random (in the absence of any prior knowledge) in which case we usually assume the different priors to be uniformly distributed (all explanations equally probable, which is the simplest assumption as long as we know nothing about them). P(E|H) is the likelihood of the evidence E (“the street is wet”) given that the hypothesis H (“it rained”) is true (which is 100% in this case). And, last but not least, P(E) is the marginal probability which describes how probable it is to get E (“the street is wet”) across all possible hypotheses that may apply to the situation.
If we pull all of this together, then Bayes’ rule simply says that our confidence in a hypothesis should increase if the new evidence fits into the picture it conveys, and decrease if it favors alternative explanations. The main purpose of the rule is to make this intuitive notion quantitative and computable.
And yet, here again, there are limits to formalisation: note that posterior probabilities are always relative to the (biased and incomplete) set of hypotheses we are working with. Compatible with the views we’ve espoused earlier, Bayes never gives us any absolute or certain knowledge, and it does not deal at all with the problem of relevance. By the time we apply the theorem to the improvement of our knowledge, all our problems have already been well-defined. And there are other issues. In particular: where do the priors come from? This gets us into the problem of infinite regress once more. As we’ve said above: it’s models, and therefore priors, all the way back. And worse: how to calculate the marginal probability? This requires us to integrate the probability of our evidence across all possible hypotheses, when we explained above that it’s impossible to define such a space of possibilities for all but the most trivial systems. There are practical tricks around this, as we shall see. But still, it is a fundamental limitation.
It should be quite obvious how the Bayesian view can be (and has been) applied to model the practice of doing science by assessing (and revising) the probability of our models. But we can take it further — a lot further, in fact. This is what a Bayesian approach called the free-energy principle (FEP) does. Pioneered by neuroscientist Karl Friston, it claims to be many things: originally a theory of human perception and action formulated in terms of active inference, it has variously been advertised as a general theory of the brain, of life, and even of “thingness” (what it means to be an object in the first place). As we shall see, the FEP is actually none of these, which is reflected by the fact that many of its proponents have abdicated most of these earlier claims and are now arguing that the FEP is simply a “neutral” modelling technique — a general organising principle for a peculiar type of (stochastic) modelling of natural systems.
It is that last claim that is interesting to us here, because it provides an interesting account of how limited living beings infer the structure of a dynamical system from an incomplete set of observations. In short, the central idea is that we modify our internal models through active exploration of our environment so as to minimise the amount of surprise (or surprisal, to use the technical term) that we produce through our (necessarily imperfect) model-based predictions. What we call the epistemic cut between natural and formal systems, the FEP calls a Markov blanket. It is the epistemic boundary across which we draw inferences (cf. chapter 11). It can be shown that this amounts to an approximate computation of Bayes’ rule, where the approximation concerns the problem of not being able to precisely calculate the marginal probability of the denominator for any realistically complex situation.
A bit more technical detail is needed to understand this properly: when FEP theorists talk about “minimising free energy,” they mean this as an analogy for an epistemic agent reducing surprisal by making its internal models congruent with perceived patterns in the ambience. This analogy is warranted by a correspondence between statistical mechanics and information theory that was first pointed out by physicist E. T. Jaynes in 1957. Roughly, it draws a parallel between a natural system that operates far from equilibrium maximising its local rate of entropy production (thereby minimising free energy through dissipation), and an epistemic agent maximising knowledge about a system (by minimising surprisal). This is possible because information can be treated as a form of (neg)entropy.
In other words, the FEP provides a probabilistic (Bayesian) way to solve the problem of congruence in Rosen’s modelling relation. It helps us choose between underdetermined hypotheses, given the limited amount of evidence we have available, and explains how agents constantly update such choices through active exploration of their ambience. Importantly, this also gives us a probabilistic and dynamic definition of the state space S of a dynamical system: there is a certain collection of states that the system is likely to visit, over and over again, while it “avoids” others. That’s just the state-space definition of a system, basically, but in probabilistic form. Alternatively, the FEP can also be formulated in terms of trajectories a system is likely to traverse. In both cases, to minimise surprisal means to construct a joint probability distribution over possible states, which characterises the possibility space of the dynamical system that is most congruent with the observed patterns in the ambience (the natural system). And it is this probability distribution that we update as we learn more about the natural system under study.
What the FEP does not do, however, is characterise evolving or living systems. In fact, due to the requirement of calculating a probability distribution over some well-defined set of state variables or trajectories, it can only be applied to what we called dynamical systems (in the broad sense) in chapter 12. This excludes open-ended evolutionary processes and living organisms. We’ll revisit this issue in the last two parts of the book. The FEP has interesting things to say about how limited agents model their world in a probabilistic manner, but it doesn’t say anything about the nature of the living agent itself, and how it acquires the capacity to formalise anything in the first place. Despite many claims to the contrary, the FEP does not address the problem of relevance. It is a framework entirely based on inference, after all, and inference (as we have seen) is a form of computation.
It is important to repeat at this point that the FEP is not a modeling formalism, but rather a theoretical principle from which specific stochastic models of natural systems can be derived. In this way, it is analogous to the principle of stationary action (colloquially known as the “path of least resistance”), from which we can derive the Lagrangian formulation of classical mechanics. Such principles are not used to model a system directly. Instead, they justify the common assumptions underlying a modeling approach. In the case of the FEP, it justifies how we formalise certain ambient patterns as internal (stochastic) models.
These stochastic models fall into two general categories: purely stochastic processes and traditional deterministic modelling frameworks, such as differential equations, with added “noise” terms (that usually consist of some kind of stochastic process).
Stochastic processes consist of a set of random variables with associated probability spaces, i.e., a sample space plus a probability measure over all the states in that space. What this means is that each variable can take on specific values from the sample space, with a probability determined by the measure. Basic examples are the Wiener process, which represents the physical phenomenon of Brownian motion, and is also used to simulate other kinds of diffusion(-like) processes, various phenomena in quantum mechanics and cosmology, or the dynamics of financial markets, and the Poisson process, which represents random events occurring with a given probability over time, such as radioactive decay, earthquakes, or incoming phone calls in a call center. There are many broader categories of stochastic processes, such as random walks (of which the Wiener process is an example), and Markov processes (all memoryless stochastic chains), but we won’t go into more detail about their properties and classification here.
Note that the randomness of a stochastic process can have two very distinct physical sources: in the case of radioactive decay, for example, it is true randomness in the form of quantum indeterminacy. In the case of Brownian motion, however, the stochasticity arises from the huge number of molecules present in the system, which all behave deterministically on their own, but interact in ways that can be efficiently summarised by the statistical properties and behaviour of the Wiener process.
Now: what if we want to describe a system that behaves in a certain way, rather predictably, on average, but exhibits seemingly random fluctuations around this kind of typical behaviour? This brings us to the second problem we have introduced at the beginning of this section, and the class of stochastic models that are based on traditional modelling formalisms with added “noise” terms. The main issue here is: how do we distinguish the noise from the typical behaviour of the system?
For some systems, like the kinetics of a simple monomolecular chemical reaction in a homogeneous medium, there are principled ways of doing this. Here, “noise” is caused by random fluctuations in the concentration of the reactant. Again, as you have probably come to expect, the principled approach requires a system with a well-defined (and, in this case, also tractably small) state space. We can then trace each state of the system with a master equation, which describes how the probability of being in that particular state changes over time given the molecules that are present at any given moment in time. The problem with this approach is that many systems (including all the continuous ones) have an infinite number of possible states, and we end up with a formal system based on an infinite number of equations! This is obviously not very practical. To say the least.
Luckily, we can abstract simpler and more manageable formulations from our initial set of master equations. When studying chemical kinetics, for example, this yields the Fokker-Planck Equation, which no longer applies to individual system states, but to their probability distribution as a whole, and its transformation over time (a bit like the Schrödinger wave equation of quantum mechanics). From there, we can go one step further and encode the system into stochastic differential equations (SDEs) such as the Langevin equation (or the Itô calculus in other contexts). These can be interpreted more intuitively, as they describe the changing values of state variables over time like a deterministic dynamical system. In contrast to the deterministic case, however, SDEs consist of a classical deterministic part (called the drift of the system), plus an added stochastic process (the “noise” term). And the good news is: for simple chemical reactions, we can derive such chemical rate equations from the master equation, step by step, based on first principles. As an added bonus, we can simulate these stochastic rate equations with arbitrary precision on a computer using the Gillespie algorithm. Talking about robust theory!
But the bad news is: this breaks down pretty quickly, long before we are simulating complex biochemical reactions in a crowded and inhomogeneous cellular milieu. Yet again, modellers will often ignore this limitation, and apply SDE models and stochastic simulation to biochemical systems in living cells without first checking whether the conditions are actually given for robust and useful insight in their particular context. This is still very much physics in a box! The simple fact is: there is no small-world formalism that can capture all the potential interactions that influence the drift and noise term of an open system that is in constant exchange with other systems. And what is drift, and what noise, may suddenly change in open-ended evolutionary processes. Evolution, and cells, are ruthlessly opportunistic that way: they will take advantage of any kind of behaviour of a chemical system if the situation requires it, no matter whether it is deterministic or (considered by an observer to be) random.
In the end, we must conclude that each of the modelling formalisms introduced in this appendix remains inside its own box! Each one of them has some advantages and some disadvantages compared to the others. In fact, there is an inevitable trade-off between the generality, accuracy, and tractability of any formal model. And each formalism has its own peculiar domain of application — a specific class of systems to which it can be usefully applied.
On the one hand, it is useless to use a fancy agent-based simulation, or a sophisticated stochastic formalism, which are both difficult to handle and analyse, on a simple phenomenon that does not depend on adaptive rules of change, and whose random fluctuations neatly cancel out when averaging its behaviour. On the other hand, it is also not conducive to robust insight to use a static network model to explain the behaviour of a dynamical system. All of this, we emphasise once more, is not difficult to understand. And yet, we still see the wrong kinds of formalisms applied to the wrong kind of system far too often, without much thought given to the validity of the approach that is used.
But this is not the main insight we gain from our survey of methods. Its central message, instead, is this: we do not have any modelling formalism able to capture the full behavioural and evolutionary potential of any living system. Life always goes beyond formalisation. This is not to say that we cannot model aspects of living systems and their evolution. In fact, the above modelling formalisms give us a powerful toolkit to do just that. Rather, we are saying that none of these formalisms are able to capture the future behaviour and evolution of a living system completely. We said it before: life, and the large world it exists in, will always surprise us! Every once in a while, at least. And this fact really matters: a machine world can be formalised in its entirety. There is no surprise, but also no relevance or meaning in such a world.
The machine kills life. That is why we must move beyond it. This does not mean that we should no longer be using the tools we’ve introduced above. Quite the contrary: we use them more wisely, and more effectively, if we are aware of their nature and limitations as tools. Our models are means to a better understanding. But they are not our world. The world, after all, is not a set.
Interpreted in these terms, the formula above becomes quite intuitive: P(H) is the prior probability, denoting the probability of hypothesis H (“it rained”) before we observed the evidence E (“the street is wet”). Such prior belief could be informed by general knowledge about the local climate (“does it rain often here?”), or it could be completely random (in the absence of any prior knowledge) in which case we usually assume the different priors to be uniformly distributed (all explanations equally probable, which is the simplest assumption as long as we know nothing about them). P(E|H) is the likelihood of the evidence E (“the street is wet”) given that the hypothesis H (“it rained”) is true (which is 100% in this case). And, last but not least, P(E) is the marginal probability which describes how probable it is to get E (“the street is wet”) across all possible hypotheses that may apply to the situation.
If we pull all of this together, then Bayes’ rule simply says that our confidence in a hypothesis should increase if the new evidence fits into the picture it conveys, and decrease if it favors alternative explanations. The main purpose of the rule is to make this intuitive notion quantitative and computable.
And yet, here again, there are limits to formalisation: note that posterior probabilities are always relative to the (biased and incomplete) set of hypotheses we are working with. Compatible with the views we’ve espoused earlier, Bayes never gives us any absolute or certain knowledge, and it does not deal at all with the problem of relevance. By the time we apply the theorem to the improvement of our knowledge, all our problems have already been well-defined. And there are other issues. In particular: where do the priors come from? This gets us into the problem of infinite regress once more. As we’ve said above: it’s models, and therefore priors, all the way back. And worse: how to calculate the marginal probability? This requires us to integrate the probability of our evidence across all possible hypotheses, when we explained above that it’s impossible to define such a space of possibilities for all but the most trivial systems. There are practical tricks around this, as we shall see. But still, it is a fundamental limitation.
It should be quite obvious how the Bayesian view can be (and has been) applied to model the practice of doing science by assessing (and revising) the probability of our models. But we can take it further — a lot further, in fact. This is what a Bayesian approach called the free-energy principle (FEP) does. Pioneered by neuroscientist Karl Friston, it claims to be many things: originally a theory of human perception and action formulated in terms of active inference, it has variously been advertised as a general theory of the brain, of life, and even of “thingness” (what it means to be an object in the first place). As we shall see, the FEP is actually none of these, which is reflected by the fact that many of its proponents have abdicated most of these earlier claims and are now arguing that the FEP is simply a “neutral” modelling technique — a general organising principle for a peculiar type of (stochastic) modelling of natural systems.
It is that last claim that is interesting to us here, because it provides an interesting account of how limited living beings infer the structure of a dynamical system from an incomplete set of observations. In short, the central idea is that we modify our internal models through active exploration of our environment so as to minimise the amount of surprise (or surprisal, to use the technical term) that we produce through our (necessarily imperfect) model-based predictions. What we call the epistemic cut between natural and formal systems, the FEP calls a Markov blanket. It is the epistemic boundary across which we draw inferences (cf. chapter 11). It can be shown that this amounts to an approximate computation of Bayes’ rule, where the approximation concerns the problem of not being able to precisely calculate the marginal probability of the denominator for any realistically complex situation.
A bit more technical detail is needed to understand this properly: when FEP theorists talk about “minimising free energy,” they mean this as an analogy for an epistemic agent reducing surprisal by making its internal models congruent with perceived patterns in the ambience. This analogy is warranted by a correspondence between statistical mechanics and information theory that was first pointed out by physicist E. T. Jaynes in 1957. Roughly, it draws a parallel between a natural system that operates far from equilibrium maximising its local rate of entropy production (thereby minimising free energy through dissipation), and an epistemic agent maximising knowledge about a system (by minimising surprisal). This is possible because information can be treated as a form of (neg)entropy.
In other words, the FEP provides a probabilistic (Bayesian) way to solve the problem of congruence in Rosen’s modelling relation. It helps us choose between underdetermined hypotheses, given the limited amount of evidence we have available, and explains how agents constantly update such choices through active exploration of their ambience. Importantly, this also gives us a probabilistic and dynamic definition of the state space S of a dynamical system: there is a certain collection of states that the system is likely to visit, over and over again, while it “avoids” others. That’s just the state-space definition of a system, basically, but in probabilistic form. Alternatively, the FEP can also be formulated in terms of trajectories a system is likely to traverse. In both cases, to minimise surprisal means to construct a joint probability distribution over possible states, which characterises the possibility space of the dynamical system that is most congruent with the observed patterns in the ambience (the natural system). And it is this probability distribution that we update as we learn more about the natural system under study.
What the FEP does not do, however, is characterise evolving or living systems. In fact, due to the requirement of calculating a probability distribution over some well-defined set of state variables or trajectories, it can only be applied to what we called dynamical systems (in the broad sense) in chapter 12. This excludes open-ended evolutionary processes and living organisms. We’ll revisit this issue in the last two parts of the book. The FEP has interesting things to say about how limited agents model their world in a probabilistic manner, but it doesn’t say anything about the nature of the living agent itself, and how it acquires the capacity to formalise anything in the first place. Despite many claims to the contrary, the FEP does not address the problem of relevance. It is a framework entirely based on inference, after all, and inference (as we have seen) is a form of computation.
It is important to repeat at this point that the FEP is not a modeling formalism, but rather a theoretical principle from which specific stochastic models of natural systems can be derived. In this way, it is analogous to the principle of stationary action (colloquially known as the “path of least resistance”), from which we can derive the Lagrangian formulation of classical mechanics. Such principles are not used to model a system directly. Instead, they justify the common assumptions underlying a modeling approach. In the case of the FEP, it justifies how we formalise certain ambient patterns as internal (stochastic) models.
These stochastic models fall into two general categories: purely stochastic processes and traditional deterministic modelling frameworks, such as differential equations, with added “noise” terms (that usually consist of some kind of stochastic process).
Stochastic processes consist of a set of random variables with associated probability spaces, i.e., a sample space plus a probability measure over all the states in that space. What this means is that each variable can take on specific values from the sample space, with a probability determined by the measure. Basic examples are the Wiener process, which represents the physical phenomenon of Brownian motion, and is also used to simulate other kinds of diffusion(-like) processes, various phenomena in quantum mechanics and cosmology, or the dynamics of financial markets, and the Poisson process, which represents random events occurring with a given probability over time, such as radioactive decay, earthquakes, or incoming phone calls in a call center. There are many broader categories of stochastic processes, such as random walks (of which the Wiener process is an example), and Markov processes (all memoryless stochastic chains), but we won’t go into more detail about their properties and classification here.
Note that the randomness of a stochastic process can have two very distinct physical sources: in the case of radioactive decay, for example, it is true randomness in the form of quantum indeterminacy. In the case of Brownian motion, however, the stochasticity arises from the huge number of molecules present in the system, which all behave deterministically on their own, but interact in ways that can be efficiently summarised by the statistical properties and behaviour of the Wiener process.
Now: what if we want to describe a system that behaves in a certain way, rather predictably, on average, but exhibits seemingly random fluctuations around this kind of typical behaviour? This brings us to the second problem we have introduced at the beginning of this section, and the class of stochastic models that are based on traditional modelling formalisms with added “noise” terms. The main issue here is: how do we distinguish the noise from the typical behaviour of the system?
For some systems, like the kinetics of a simple monomolecular chemical reaction in a homogeneous medium, there are principled ways of doing this. Here, “noise” is caused by random fluctuations in the concentration of the reactant. Again, as you have probably come to expect, the principled approach requires a system with a well-defined (and, in this case, also tractably small) state space. We can then trace each state of the system with a master equation, which describes how the probability of being in that particular state changes over time given the molecules that are present at any given moment in time. The problem with this approach is that many systems (including all the continuous ones) have an infinite number of possible states, and we end up with a formal system based on an infinite number of equations! This is obviously not very practical. To say the least.
Luckily, we can abstract simpler and more manageable formulations from our initial set of master equations. When studying chemical kinetics, for example, this yields the Fokker-Planck Equation, which no longer applies to individual system states, but to their probability distribution as a whole, and its transformation over time (a bit like the Schrödinger wave equation of quantum mechanics). From there, we can go one step further and encode the system into stochastic differential equations (SDEs) such as the Langevin equation (or the Itô calculus in other contexts). These can be interpreted more intuitively, as they describe the changing values of state variables over time like a deterministic dynamical system. In contrast to the deterministic case, however, SDEs consist of a classical deterministic part (called the drift of the system), plus an added stochastic process (the “noise” term). And the good news is: for simple chemical reactions, we can derive such chemical rate equations from the master equation, step by step, based on first principles. As an added bonus, we can simulate these stochastic rate equations with arbitrary precision on a computer using the Gillespie algorithm. Talking about robust theory!
But the bad news is: this breaks down pretty quickly, long before we are simulating complex biochemical reactions in a crowded and inhomogeneous cellular milieu. Yet again, modellers will often ignore this limitation, and apply SDE models and stochastic simulation to biochemical systems in living cells without first checking whether the conditions are actually given for robust and useful insight in their particular context. This is still very much physics in a box! The simple fact is: there is no small-world formalism that can capture all the potential interactions that influence the drift and noise term of an open system that is in constant exchange with other systems. And what is drift, and what noise, may suddenly change in open-ended evolutionary processes. Evolution, and cells, are ruthlessly opportunistic that way: they will take advantage of any kind of behaviour of a chemical system if the situation requires it, no matter whether it is deterministic or (considered by an observer to be) random.
In the end, we must conclude that each of the modelling formalisms introduced in this appendix remains inside its own box! Each one of them has some advantages and some disadvantages compared to the others. In fact, there is an inevitable trade-off between the generality, accuracy, and tractability of any formal model. And each formalism has its own peculiar domain of application — a specific class of systems to which it can be usefully applied.
On the one hand, it is useless to use a fancy agent-based simulation, or a sophisticated stochastic formalism, which are both difficult to handle and analyse, on a simple phenomenon that does not depend on adaptive rules of change, and whose random fluctuations neatly cancel out when averaging its behaviour. On the other hand, it is also not conducive to robust insight to use a static network model to explain the behaviour of a dynamical system. All of this, we emphasise once more, is not difficult to understand. And yet, we still see the wrong kinds of formalisms applied to the wrong kind of system far too often, without much thought given to the validity of the approach that is used.
But this is not the main insight we gain from our survey of methods. Its central message, instead, is this: we do not have any modelling formalism able to capture the full behavioural and evolutionary potential of any living system. Life always goes beyond formalisation. This is not to say that we cannot model aspects of living systems and their evolution. In fact, the above modelling formalisms give us a powerful toolkit to do just that. Rather, we are saying that none of these formalisms are able to capture the future behaviour and evolution of a living system completely. We said it before: life, and the large world it exists in, will always surprise us! Every once in a while, at least. And this fact really matters: a machine world can be formalised in its entirety. There is no surprise, but also no relevance or meaning in such a world.
The machine kills life. That is why we must move beyond it. This does not mean that we should no longer be using the tools we’ve introduced above. Quite the contrary: we use them more wisely, and more effectively, if we are aware of their nature and limitations as tools. Our models are means to a better understanding. But they are not our world. The world, after all, is not a set.
|
Previous: Some Words About Set Theory
|
Next: What Is Category Theory?
|
The authors acknowledge funding from the John Templeton Foundation (Project ID: 62581), and would like to thank the co-leader of the project, Prof. Tarja Knuuttila, and the Department of Philosophy at the University of Vienna for hosting the project of which this book is a central part.
Disclaimer: everything we write and present here is our own responsibility. All mistakes are ours, and not the funders’ or our hosts’ and collaborators'.
Disclaimer: everything we write and present here is our own responsibility. All mistakes are ours, and not the funders’ or our hosts’ and collaborators'.