Computer
As we have just seen, it was Newton who first came up with a way, a method, to calculate or compute the future based on information about the present. He achieved this by separating the rules that govern change from the current state of the world, as if those rules existed independently from what is actually going on. This simple methodological trick produced the tool that we called his universal mathematical oracle. We also discussed how this powerful ability to predict (and thus control) relies on a number of philosophical assumptions that are not at all empirically motivated or scientific in nature, and can only be understood in the particular historical context in which they were originally proposed.
300 years have passed since Newton’s death. During this time, the machine view has shifted from imagining the world as a sophisticated mechanical clockwork towards the idea that reality resembles a modern-day electronic computer. We have already discussed how this resurrects Laplace’s demon and brings it forward into the present. In this chapter, we will flesh out a bit more how this happened, and why it increasingly hampers our ability to distinguish our maps from the territory.
But what is computation exactly? When we first encountered the concept, we said it had to do with the rule-based manipulation of symbols, such as when you solve a mathematical problem with pen and paper. This was exactly the starting point for Alonso Church and his student Alan Turing in the 1930s, when they came up with their respective attempts at a formal theory of computation. Notice that when these two talk about a “computer” they are not thinking of a machine. Instead, they mean a person — often an underpaid and underappreciated woman — working in the back of the accounting or engineering department, performing rote (and often very tedious) calculations that required a high level of intelligence, diligence, and an enormous amount of concentration for hours on end.
So let us get this straight right from the beginning: the modern theory of computation is a theory of a very specific human activity — calculating by rote. Nothing more, nothing less. No surprise then, when people these days rediscover that a human brain can compute! That was the whole point of the theory to begin with. We discuss this point in more detail in our appendix on computation.
But this oft-ignored origin of the theory of computation is not our main bone of contention. What interests us more is how this theory led to the technological evolution of the modern electronic computer which, in turn, is now seen by many as a good metaphor for how the physical world works. At first glance, all of this seems to be a little topsy-turvy, philosophically speaking. And it is, indeed!
On top of this, we should also examine how our modern notion of computation relates to Newton’s initial approach to computing the world. The two are very different, in terms of their methods and assumptions, but they share some of the same problems and implications. In particular, they both massively restrict how scientists allow themselves to see the world. And this is not a good thing.
Let’s go one step at a time though, and first take a closer look at Alan Turing’s version of the theory, which is based on an abstract model that not only gives us precise definitions of “computation,” “algorithm,” and “machine” (coming in handy for the rest of our argument further down), but also underlies the common hardware architecture of pretty much all digital computing devices in use today.
To honour its inventor, we’ve come to call this model the Turing Machine. Its importance for our digital age cannot be understated. But keep in mind, and this is crucial: it is an abstract model, a map, not a real-world machine, not the territory. In fact, it is a model of a universal machine. We know, it’s a tall claim, and it has led to plenty of confusion. So let us try to explain what this means.
Remember the problem-solving framework of Newell and Simon? When you solve a mathematical problem, what you do is you start with a bunch of symbols (some mathematical expressions, let’s say, or shapes, if you do geometry) and you apply a number of predefined rules to those symbols to transform or recombine them towards a desired result. The sequence of steps from the initial string of symbolic objects to the eventual outcome (if it exists) is called a solution to the problem. If you manage to get there eventually (within a finite amount of time), you have solved the problem. A mathematician or computer scientist would say that you have found an effective method to compute the solution.
The problem is that the idea of “effective computation” is difficult to pin down precisely. We need a formalisation, in the sense we’ve discussed before, to turn this ill-defined concept into a well-defined one. Church, Turing, and another one of Church’s students, Stephen Kleene, all came up with their own version of how to do this. And the fascinating thing is: Turing managed to prove mathematically that they all yield exactly the same set of computable (i.e., solvable) problems in the end. Despite looking very different at first glance, they are all equivalent! But we are getting way ahead of ourselves.
Turing’s machine is his particular way of formalising the concept of “effective computation.” This abstract construct consists of an imagined reading head which moves along an endless string of symbols (called its tape). The head can be in a finite number of distinct internal states. Depending on its current state, and the symbol it is reading from the tape, the head will execute one of a finite number of operations it has at its disposal, write the output back onto the tape (replacing the previous symbol), and then either move left or right (or stay put). Symbols, states, and operations together define a finite table of rules which determine what kind of problem the Turing machine can solve.
It should be fairly obvious that we can encode any problem solution in the sense of Newell and Simon as a Turing machine. After all, computing a solution to a problem is just applying a finite sequence of operations to a given string of symbols. The machine formalises this activity: it can be fully automated, i.e., is completely rule-based. No ill-defined intuition, improvisation, creativity, or agency are involved.
This yields the definition of an algorithm: a defined sequence of operations or instructions, chosen from a finite set, applied to a string of symbols called the input data. To be a proper algorithm, in a strict sense, this sequence must eventually terminate to produce some output, and it must be guaranteed to produce the correct output given a specific problem (no matter what input). Conceptually, this definition is comprehensive and crystal clear. But we’ve mentioned before that it can be difficult — even impossible, — in practice to establish whether the two latter restrictions apply, because of the decision problem.
This is all pretty cool. Powerful stuff. But we can do better still! As the next step in his argument, Turing makes us imagine that the algorithm or program available to the machine can itself be read from its tape. That is, the computing machine can be programmed. This requires only that the head can distinguish somehow between symbols on the tape that belong to the program and those that serve as its input data. This is straightforward: it is just another well-defined task for the machine to perform.
This simple addition to the machine yields an astonishing result: a universal Turing machine that can run any algorithm, carry out any effective computation we can come up with, because any possible program (any finite sequence of operations) can be written to and read from its infinite tape. In other words, a universal Turing machine can solve any problem that is well-defined in this way! At least in principle — given an infinite amount of time and resources. We’ll get back to that later.
Amazing, isn’t it? And extremely tempting: if a universal Turing machine can solve any well-defined problem, is it not so that we can describe and reproduce the behaviour of any possible mechanical device with it? What such a device does (its function) appears to be precisely definable in terms of the operations it is carrying out, and we should be able to localise these operations to specific parts of the device and their interactions. Take a bicycle, for example: its pedals transduce the force of the pedaller through its chain to the rear-wheel which propels the cycle forward. Gears add efficiency by changing the ratio of pedalling force and amount of forward propulsion. The function of the steering bar is to change direction by rotating the heading of the front-wheel. And so on.
It seems evident from this example (and others that are easy to come by) that the functions of mechanical devices can be perfectly captured through a finite set of operations: an algorithm. And algorithms, by definition, can be run on a universal Turing machine, which is therefore able to predict the functional behaviour of the device in every little bit of relevant detail. Of course it should go without saying that the simulation of the device is never the device itself! Our simulation lacks the physical embeddedness to actually propel us forward — we can’t ride it to work. Still, it can capture and reproduce the functional behaviour of the mechanical device (the bicycle) accurately and completely.
This, in fact, is a very reasonable way to define a (real-world) machine, and it is the definition of the term we shall use throughout the rest of the book. In this sense, a machine is a physical device whose functional behaviour can be accurately and completely mimicked by a universal Turing machine.
This kind of universality is undoubtedly the superpower of Turing’s model. It did not take very long until engineers realised they could build actual machines that get pretty close to its idealised blueprint. This led to the canonical von Neumann architecture for stored-program computers: the endless tape replaced by (finite) random-access memory and supplemented by external mass storage, connected by input-output channels (called “buses”) to one or more central processing and control units. In a funny twist of history, modern computing devices are real-world approximations to an abstract model of a human being doing calculations by rote! Turing’s model spawned the digital age. Our current electronic computers are designed to be machines that can emulate all other machines.
But the awesome power of Turing’s model does not stop there either. Think about it for a minute or two. The universal Turing machine is a model not just of any machine, but of any small world. Everything that could ever exist in such a world can be captured completely by the model by definition!
From there, it is a small step to the following train of thought: if the natural world were a small world (a machine world, working like a digital computer) then the demon would have full control over its future. You have to admit, this sounds even better than Newton’s original oracle! The demon could implement any process that can possibly occur in such a world as an algorithm, and it is able to calculate its outcome, as it has unlimited time and resources. And with technology powerful enough, we too could get close to this ultimate aspiration and become true masters of the universe! All it takes is to understand properly how the world computes. Tempting indeed — all too tempting, as we shall see.
But there is more! We’re still not quite done yet. After creating his universal machine, Turing compared the set of problems that can be solved using his model with those that came out of Church’s very different approach to computation (which we will describe in more detail later because it is interesting in its own right). And, as we’ve already mentioned, he found that the two sets are identical!
This wasn’t entirely surprising though, because Church had already shown that his approach yielded the set of problems that was known as Kleene’s “general recursive functions.” And so on ever since: from the 1930s onward, there have been numerous attempts at precisely defining “effective computation,” and each one of them ends up producing the same set of problems, without fail. It seems like there is a canonical and perfectly formalisable concept of what it means to be computable.
This is called the Church-Turing thesis. It remains a thesis (or conjecture) to this day: it cannot be proven mathematically since we just cannot define precisely enough what our informal notion of “effective computation” actually means. Mathematicians are finicky that way.
More importantly, the Church-Turing conjecture is one of the most misunderstood propositions in the history of mathematics and computer science. All it does is to argue that, no matter how you try to formalise the human activity of calculating something by rote, you always end up with the same set of problems that can actually be computed. And in Turing’s specific case: his universal machine can emulate any other machine that implements a computation that a human being could perform by rote.
Although not proven (and probably not provable) in the strict mathematical sense, the Church-Turing conjecture is robustly supported by evidence and very useful in practice. But, in spite of what we said above, it does not demonstrate that every formalisable process is computable. Nor does it actually show that “machine” as defined in an abstract algorithmic sense above is the same as “mechanism” in the sense of Newton, or “machine” in the colloquial sense denoting some real-world mechanical device. This difference is crucial. And it is not controversial. We will shortly introduce several counterexamples of logically consistent formal problems and systems, and of physical mechanisms we can actually build, that are not Turing computable. We know these exist: they are plentiful, and not mysterious in any way.
You probably guessed it already: the reason for this is that the natural world we live in is not small. And small-world models will not do in our large world. We’ll keep on repeating this over and over again: it is foolish to mistake your map for the territory, no matter how well the map seems to fit your purposes.
Woah. We’d better take a deep breath at this point and look back. We’ve moved, at breakneck speed, from a simple model of a human being solving mathematical problems with pen and paper to a universal machine which spawned the dominant real-world technology of the digital age. No wonder we are bedazzled by the power of Turing’s model, the Church-Turing thesis, and our own technological ingenuity! And no wonder it has gone to our heads.
It should be fairly obvious that we can encode any problem solution in the sense of Newell and Simon as a Turing machine. After all, computing a solution to a problem is just applying a finite sequence of operations to a given string of symbols. The machine formalises this activity: it can be fully automated, i.e., is completely rule-based. No ill-defined intuition, improvisation, creativity, or agency are involved.
This yields the definition of an algorithm: a defined sequence of operations or instructions, chosen from a finite set, applied to a string of symbols called the input data. To be a proper algorithm, in a strict sense, this sequence must eventually terminate to produce some output, and it must be guaranteed to produce the correct output given a specific problem (no matter what input). Conceptually, this definition is comprehensive and crystal clear. But we’ve mentioned before that it can be difficult — even impossible, — in practice to establish whether the two latter restrictions apply, because of the decision problem.
This is all pretty cool. Powerful stuff. But we can do better still! As the next step in his argument, Turing makes us imagine that the algorithm or program available to the machine can itself be read from its tape. That is, the computing machine can be programmed. This requires only that the head can distinguish somehow between symbols on the tape that belong to the program and those that serve as its input data. This is straightforward: it is just another well-defined task for the machine to perform.
This simple addition to the machine yields an astonishing result: a universal Turing machine that can run any algorithm, carry out any effective computation we can come up with, because any possible program (any finite sequence of operations) can be written to and read from its infinite tape. In other words, a universal Turing machine can solve any problem that is well-defined in this way! At least in principle — given an infinite amount of time and resources. We’ll get back to that later.
Amazing, isn’t it? And extremely tempting: if a universal Turing machine can solve any well-defined problem, is it not so that we can describe and reproduce the behaviour of any possible mechanical device with it? What such a device does (its function) appears to be precisely definable in terms of the operations it is carrying out, and we should be able to localise these operations to specific parts of the device and their interactions. Take a bicycle, for example: its pedals transduce the force of the pedaller through its chain to the rear-wheel which propels the cycle forward. Gears add efficiency by changing the ratio of pedalling force and amount of forward propulsion. The function of the steering bar is to change direction by rotating the heading of the front-wheel. And so on.
It seems evident from this example (and others that are easy to come by) that the functions of mechanical devices can be perfectly captured through a finite set of operations: an algorithm. And algorithms, by definition, can be run on a universal Turing machine, which is therefore able to predict the functional behaviour of the device in every little bit of relevant detail. Of course it should go without saying that the simulation of the device is never the device itself! Our simulation lacks the physical embeddedness to actually propel us forward — we can’t ride it to work. Still, it can capture and reproduce the functional behaviour of the mechanical device (the bicycle) accurately and completely.
This, in fact, is a very reasonable way to define a (real-world) machine, and it is the definition of the term we shall use throughout the rest of the book. In this sense, a machine is a physical device whose functional behaviour can be accurately and completely mimicked by a universal Turing machine.
This kind of universality is undoubtedly the superpower of Turing’s model. It did not take very long until engineers realised they could build actual machines that get pretty close to its idealised blueprint. This led to the canonical von Neumann architecture for stored-program computers: the endless tape replaced by (finite) random-access memory and supplemented by external mass storage, connected by input-output channels (called “buses”) to one or more central processing and control units. In a funny twist of history, modern computing devices are real-world approximations to an abstract model of a human being doing calculations by rote! Turing’s model spawned the digital age. Our current electronic computers are designed to be machines that can emulate all other machines.
But the awesome power of Turing’s model does not stop there either. Think about it for a minute or two. The universal Turing machine is a model not just of any machine, but of any small world. Everything that could ever exist in such a world can be captured completely by the model by definition!
From there, it is a small step to the following train of thought: if the natural world were a small world (a machine world, working like a digital computer) then the demon would have full control over its future. You have to admit, this sounds even better than Newton’s original oracle! The demon could implement any process that can possibly occur in such a world as an algorithm, and it is able to calculate its outcome, as it has unlimited time and resources. And with technology powerful enough, we too could get close to this ultimate aspiration and become true masters of the universe! All it takes is to understand properly how the world computes. Tempting indeed — all too tempting, as we shall see.
But there is more! We’re still not quite done yet. After creating his universal machine, Turing compared the set of problems that can be solved using his model with those that came out of Church’s very different approach to computation (which we will describe in more detail later because it is interesting in its own right). And, as we’ve already mentioned, he found that the two sets are identical!
This wasn’t entirely surprising though, because Church had already shown that his approach yielded the set of problems that was known as Kleene’s “general recursive functions.” And so on ever since: from the 1930s onward, there have been numerous attempts at precisely defining “effective computation,” and each one of them ends up producing the same set of problems, without fail. It seems like there is a canonical and perfectly formalisable concept of what it means to be computable.
This is called the Church-Turing thesis. It remains a thesis (or conjecture) to this day: it cannot be proven mathematically since we just cannot define precisely enough what our informal notion of “effective computation” actually means. Mathematicians are finicky that way.
More importantly, the Church-Turing conjecture is one of the most misunderstood propositions in the history of mathematics and computer science. All it does is to argue that, no matter how you try to formalise the human activity of calculating something by rote, you always end up with the same set of problems that can actually be computed. And in Turing’s specific case: his universal machine can emulate any other machine that implements a computation that a human being could perform by rote.
Although not proven (and probably not provable) in the strict mathematical sense, the Church-Turing conjecture is robustly supported by evidence and very useful in practice. But, in spite of what we said above, it does not demonstrate that every formalisable process is computable. Nor does it actually show that “machine” as defined in an abstract algorithmic sense above is the same as “mechanism” in the sense of Newton, or “machine” in the colloquial sense denoting some real-world mechanical device. This difference is crucial. And it is not controversial. We will shortly introduce several counterexamples of logically consistent formal problems and systems, and of physical mechanisms we can actually build, that are not Turing computable. We know these exist: they are plentiful, and not mysterious in any way.
You probably guessed it already: the reason for this is that the natural world we live in is not small. And small-world models will not do in our large world. We’ll keep on repeating this over and over again: it is foolish to mistake your map for the territory, no matter how well the map seems to fit your purposes.
Woah. We’d better take a deep breath at this point and look back. We’ve moved, at breakneck speed, from a simple model of a human being solving mathematical problems with pen and paper to a universal machine which spawned the dominant real-world technology of the digital age. No wonder we are bedazzled by the power of Turing’s model, the Church-Turing thesis, and our own technological ingenuity! And no wonder it has gone to our heads.
Computation
We’ve already hinted at the fact that there are two very large elephants in the room. The first concerns the question why and how we have convinced ourselves that the real world is computable. The second revolves around the relationship of Turing’s universal machine to Newton’s universal oracle, and how they guide our understanding and capacity to predict. Let’s look at this second elephant first.
The first question we must tackle is: how is computation enabled and constrained by physics? This is what originally drove inquiries into the relationship between physics and computation, beginning in the early 1960s. We already know from Church and Turing what can be computed in principle. But now we want to know what kind of computations can actually be carried out within the limits of the laws of physics, and within the boundaries of the particular universe we live in.
Whether or not you can calculate a solution to a problem in practice depends much less on whether it is computable in the first place, and more on how many resources you need to compute it. The thing is: computation as a physical activity can be limited in many ways — by energy, time, and/or space.
There is a very long-running discussion about the energy consumption of computation. So let’s tackle this one first. The discussion goes back to the year 1867 when James Clerk Maxwell formulated a puzzling and controversial thought experiment about the second law of thermodynamics.
Maxwell’s set-up is intriguing, and not only because it introduces yet another demon, completing an unholy trinity of mischievous philosophical imps. Unlike the god-like fiends that bothered Laplace and Descartes, Maxwell’s demon is a mere mortal: it sits on top of a box filled with gas. This box is divided into two chambers with a trap door between them. The gas consists of a mixture of molecules, each with its specific kinetic energy (which depends on its mass and velocity). These energies are distributed in a way that is dictated by the temperature of the gas: some molecules move faster, some slower.
The demon controls the trap door, opening and shutting it quickly enough to only let one single molecule pass through at a time. In this way, she selects, let’s say, fast molecules to pass into the left chamber and slow-moving ones to cross over to the right. Note that the demon does not alter the course or velocity of any molecule, which means the average kinetic energy of the gas is completely conserved and remains constant. Also: the trap door is frictionless, so opening and closing it does not use up any energy. Still, the left chamber increasingly heats up, and the right chamber cools down over time, based exclusively on the demon’s ability to distinguish between fast and slow moving molecules.
This scenario does not obviously break any known laws of physics, but it does seem to fly in the face of the second law of thermodynamics, which states that the temperature difference between the two chambers should decrease and equalise (not increase) as long as no extra energy is added to the system.
Maxwell’s scenario was refined in 1929 by physicist Leó Szilárd. His imaginary engine consists of a slightly modified box, with the same demon on top and a trap door between the two chambers. But Szilárd’s box contains only one lonely molecule of gas. The demon knows which chamber it is in (e.g., the left one). Such either/or situations (left or right) epitomise what it means for the demon to possess exactly one bit of information. Accordingly, she moves the separating wall to the left, thereby increasing the pressure in the chamber with the molecule in it. (Yes, one molecule can exert pressure!) This is later released as a tiny bit of physical work when the molecule, through its collisions with the wall, pushes it back towards the right.
The crucial advantage of Szilárd’s version of the thought experiment is that you can calculate exactly how much. For those interested in such details: it is equal to the natural logarithm of two, multiplied by Boltzmann’s constant and the temperature of the system (ln 2 kB T). This is really not much work! But it is also not nothing, especially considering that the second law of thermodynamics seems to prohibit its very existence. The question is: where did the energy for this tiny bit of work come from?
Both demonic scenarios have in common that energy appears to be extracted from the system based on information only: the demon acts solely on what she knows about the state of the molecules in the box, and does not actively alter their energy in any way. In the case of the Szilárd engine, we can quantify this: one bit of information delivers a tiny but very specific amount of physical work. From the principle that energy must be conserved in the system, we must conclude that information and physical energy are somehow interconvertible.
We could draw very big conclusions from this, if we were so inclined. For instance, we could claim that information must be as fundamental as energy — a basic ingredient of the universe! But again, we are getting ahead of ourselves. So let us be more modest for now and just consider that the measurement made by the demon must be where the missing energy is coming from: it takes work to know where the molecules are and how fast they are moving. So, if we reframe and include the demon as part of the system, its actions are what is balancing the energy sheets, preventing us from breaking the second law.
But demons are complex living systems (we presume), and their thermodynamic properties are not well understood (that’s for sure). So we’ll leave this aspect of the problem aside for now. Then the simple take-home message is this: measurement requires energy. Maxwell and Szilárd came up with their thought experiments before the theory of computation was developed. But what applies to measurement, also applies to computation, because both are forms of information processing. And we can now ask: how much energy, at minimum, is required to perform some computation?
Physicist Rolf Landauer did just that in 1961, formulating what we now know as Landauer’s principle. To properly understand his principle, let us quickly revise the difference between reversible and irreversible processes in physics: a thermodynamically reversible process is one that can easily be turned around through small changes in its surroundings. If you change the concentrations of two substances in a chemical reactor, for instance, their reaction flow will quickly “relax” them back to an equilibrium state. The demon opening and shutting the frictionless trap door is another good example: at the end of the process, the door is in the same shut position we started with, and no net energy had to be expended to perform the operation. Opening and closing the door is thus reversible.
An irreversible process, on the other hand, is one that cannot easily be turned around, without investing a significant amount of work and energy. Think of an explosion: once you blow up the dynamite (and maybe your house with it), it is difficult to reassemble the mess you’ve made and impossible to simply reuse the explosive for another blast.
There is an equivalent distinction to be made for computational processes. For simplicity, we assume that we are dealing with a binary computer, like most digital computers today. Its programs and data consist of bit strings containing only 0s and 1s. Looking at the computer’s available set of operations, we can classify them into “reversible” and “irreversible” analogous to thermodynamic processes.
Take the logical operator “bit-flip” or NOT for instance. This negation turns a 0 into a 1, and vice versa. This operation is obviously reversible. Most others, however, are not. Think about erasure, which is what Landauer did. Let’s say we start with a random bit string, e.g., 0101100, setting all its positions to zero: 0000000. There is no way we can reconstruct the string we started with from the result. Any random starting sequence will end up as 0000000. This is called a many-to-one map. When we erase data, information is lost, irrevocably. This is the computational equivalent of irreversibility.
Not just erasure, but also standard logical operators like AND and OR, are irreversible (many-to-one) mappings. Take AND, for instance: it maps 00, 01, and 10 to 0, while only 11 is mapped to 1. In this case, we can reverse the operation without loss, but only if the result is 1, not 0. In the latter case, again, information is lost and the process is irreversible. We cannot turn it around.
Landauer’s principle states that any computation that involves logically irreversible functions must also be thermodynamically irreversible. In fact, he demonstrates that each irreversible operation on a bit incurs at minimum the exact same amount of energy (ln 2 kB T) that Szilárd’s demon produced out of its one-molecule machine. What an unlikely coincidence! It allows physicists to test the Landauer energy limit by building experimental approximations of Szilárd’s engine. All such efforts so far have been in tight accordance with the theoretical predictions. The kind of computer you use everyday, however, is very far removed from such maximal energy efficiency. A laptop or smartphone uses orders of magnitude more energy for each processed bit (reversible or not) than these experimental molecular computers.
When all is said and done, it turns out that Landauer’s principle simply is a computational restatement of the second law of thermodynamics. Computers cannot cheat physics (or fate) either. Or so it seemed until Charles Bennett, in 1973, came up with a way to beat the odds. Just like the demon of Maxwell and Szilárd can “cheat” the second law (by investing work into measurement), so can computers. All you have to do to make logically irreversible operations thermodynamically reversible, it turns out, is to store all the intermediate results of your computation. This way, you can retrace your computation step by step. But while storage does not use up any energy in theory, it does so in practice.
And not only that. This accumulation of intermediate storage will get you in trouble with another physical limitation: storage needs space. And space is not necessarily unlimited either, at least not within the observable part of our universe. So now that we know how much energy we need to compute, we can also ask how much computation can be done in a limited volume of space.
This is what Jacob Bekenstein did in 1981. He is the guy who — together with Stephen Hawking — found out that black holes evaporate. He also figured out how much entropy can be squeezed into a given amount of space (assuming there is no black hole in that space). You can interpret entropy, in this context, as a measure for either the free energy or the information available for computation. This limit is now called the Bekenstein bound. Taking the Szilárd/Landauer conversion of information and energy above, we can then calculate how much computation can possibly be performed within the entire bounded volume of spacetime occupied by the observable universe — all the time and space that ever existed, as far as we know (again, excluding whatever lies beyond the event horizon of black holes).
But first things first: since Bekenstein established that only a limited amount of information can be stored in any finite amount of space, we know now that building an actual Turing machine is impossible because it requires an infinite tape. Also, only a limited amount of energy will ever be available for computation in practice, which in turn caps the maximal number of operations that we can perform within the observable universe. And as if this were not enough, there is also an absolute speed limit for the serial processing of instructions, which is connected to the relaxation times of the thermodynamically irreversible processes driving computational devices. All in all, it should be clear: a true universal computer — a real-world Turing machine — will forever remain an abstract idealisation.
But do not fret. We can still do lots and lots and lots of calculations within the bounds set by physics. Physicist Seth Lloyd, for example, imagined in 2010 what it would mean to build what he called the “ultimate laptop.” This device, weighing one kilogram and occupying a volume of one litre, would be able to store about 1020 times as much information as your laptop does right now. It would also be 1035 times as fast. On the downside, it would burn your thighs pretty badly, as it operates at a temperature of one billion degrees Kelvin. That’s more than 50 times as hot as the core of our sun.
If you extrapolate all this to the whole history and size of the observable universe, you get an estimated total information content of about 10^124 bits. This number is not just astronomical — it’s absolutely ridiculous! A one with 124 zeros behind it. Remember that a billion has only nine zeros, and that order of magnitude is already hard to imagine for our puny human brains. In comparison, there are an estimated 10^78 to 10^82 atoms of ordinary (baryonic) matter in the entire observable universe. These atoms are spread across 200 billion (2x10^11) to 2 trillion (2x10^12) galaxies believed to exist, each containing billions of stars and planets. All of this is simply mind-boggling, far beyond our grasp.
But this may not even be all there is. We will discuss later why the actual upper bound for information may be much larger, possibly even infinite, which seems absurd but really isn’t (for reasons we will explain). Anyway, the bottom line is this: the practical limits imposed on computation by physics are not really something you need to lose any sleep about. We won’t hit those limits any time soon.
And yet, it is interesting to know that there are limits, because it allows us to think about specific physical phenomena in terms of how much time and effort they would need to come about. Let’s ask, for example: how likely is it that complex human beings have evolved on planet earth at exactly this point in the cosmological evolution of the universe? This can be framed as a problem of computational complexity. The ideas we’ve just introduced are powerful tools for examining this kind of question.
While the theory of computation asks whether something is calculable by a human being by rote in principle, computational complexity theory asks how difficult it would be in practice. How much time and energy (and sometimes also space, in terms of memory storage) is required to compute a solution to a certain problem? It turns out this is much harder to establish than Turing-computability. Algorithms are tricky beasts, because they are so flexible: you can run them on all kinds of problems and on all kinds of input data. In addition, there are many ways to go about measuring the complexity of an algorithmic process.
In traditional computational complexity theory, we partition the set of computable functions (the one that Turing found and showed to be equivalent to those of Church and Kleene) into sub-classes whose time to termination either grows slowly (or polynomially, as the mathematician would say) with the size of the problem they are solving, versus those for which it grows fast (i.e., exponentially). This is massively important for computer science and its many practical applications, but it is not our main focus here, because we are more interested in the large world of physics rather than the small world of algorithms.
Our problem is related though. We ask: given some object or phenomenon that we can observe, how much computation would be required to simulate it? How much effort to generate it? These are delicate questions and we’ve said already that there are many ways to answer them. Without digressing into a lengthy treatise on the topic, let us just pick out two measures that will be useful later on. What they quantify are slightly different flavours of computational complexity.
One way to go about this is to figure out what the shortest computer program would be that produces the observable object as its output. This measure is called Solomonoff-Kolmogorov-Chaitin complexity, or algorithmic complexity for short. It is notoriously difficult to pin down in practice — an issue we are sadly familiar with by now — because you can rarely rigidly prove that you have indeed found the shortest possible algorithm to produce a given output. The answer depends, among other things, on the formalism or programming language you use to implement your algorithm.
Zooming away from such details, however, it becomes clear that computational complexity has something to do with how much you can compress an algorithm: what we are looking for, after all, is an algorithm that produces a long calculation with as few instructions as possible. This can be achieved, for instance, with loop structures in imperative programming languages, or by recursive function calls in functional programming. If you don’t know what this means, don’t worry! The upshot is: the more regular, redundant, and repetitive your algorithm is, the easier it is to compress it.
The most compressible algorithms consist of just one statement. These are what we call physical laws, expressed by mathematical equations, and we will talk about them some more in the next section. They deal with the most compressible (hence regular) processes in the world. But mostly, we only manage to compress an algorithm by so much, and in some cases, we cannot reduce it at all. The processes such incompressible algorithms simulate lack regularity completely. In other words, they are irreducible. For this reason, computational complexity is often equated with compressibility or reducibility.
There is another, related way to look at the reducibility of an algorithmic process, to assess the computational complexity of its output object. It looks at how much you can shorten the time of execution of an algorithm that produces an observed object as output, i.e., the number of calculating steps needed to arrive at a solution. This is called time complexity. The idea behind it is that complex objects take more effort (and thus more steps) to generate. Charles Bennett, whom we’ve already encountered above, calls this the logical depth of a physical object. Others call it the assembly index, which is just another name for the same thing. It captures what Stephen Wolfram calls computational irreducibility. We will use these kinds of concepts heavily later on, when we discuss ideas about emergence.
But first, let’s take a little breather again! By now, we hope to have given you some intuitions about the limitations of computation in the physical world, and how we can use ideas about computational complexity to examine the effort (in terms of energy, time, and space) that has to go into the generation of certain objects, phenomena, or processes. While this is all very interesting, and will be useful later on, we have only addressed half of the relationship between computation and physics so far. Therefore, it is high time to ask the more difficult question: are the laws of physics, as stated by Newton and his successors, computable? It is to this can of worms that we will turn our attention next.
The crucial advantage of Szilárd’s version of the thought experiment is that you can calculate exactly how much. For those interested in such details: it is equal to the natural logarithm of two, multiplied by Boltzmann’s constant and the temperature of the system (ln 2 kB T). This is really not much work! But it is also not nothing, especially considering that the second law of thermodynamics seems to prohibit its very existence. The question is: where did the energy for this tiny bit of work come from?
Both demonic scenarios have in common that energy appears to be extracted from the system based on information only: the demon acts solely on what she knows about the state of the molecules in the box, and does not actively alter their energy in any way. In the case of the Szilárd engine, we can quantify this: one bit of information delivers a tiny but very specific amount of physical work. From the principle that energy must be conserved in the system, we must conclude that information and physical energy are somehow interconvertible.
We could draw very big conclusions from this, if we were so inclined. For instance, we could claim that information must be as fundamental as energy — a basic ingredient of the universe! But again, we are getting ahead of ourselves. So let us be more modest for now and just consider that the measurement made by the demon must be where the missing energy is coming from: it takes work to know where the molecules are and how fast they are moving. So, if we reframe and include the demon as part of the system, its actions are what is balancing the energy sheets, preventing us from breaking the second law.
But demons are complex living systems (we presume), and their thermodynamic properties are not well understood (that’s for sure). So we’ll leave this aspect of the problem aside for now. Then the simple take-home message is this: measurement requires energy. Maxwell and Szilárd came up with their thought experiments before the theory of computation was developed. But what applies to measurement, also applies to computation, because both are forms of information processing. And we can now ask: how much energy, at minimum, is required to perform some computation?
Physicist Rolf Landauer did just that in 1961, formulating what we now know as Landauer’s principle. To properly understand his principle, let us quickly revise the difference between reversible and irreversible processes in physics: a thermodynamically reversible process is one that can easily be turned around through small changes in its surroundings. If you change the concentrations of two substances in a chemical reactor, for instance, their reaction flow will quickly “relax” them back to an equilibrium state. The demon opening and shutting the frictionless trap door is another good example: at the end of the process, the door is in the same shut position we started with, and no net energy had to be expended to perform the operation. Opening and closing the door is thus reversible.
An irreversible process, on the other hand, is one that cannot easily be turned around, without investing a significant amount of work and energy. Think of an explosion: once you blow up the dynamite (and maybe your house with it), it is difficult to reassemble the mess you’ve made and impossible to simply reuse the explosive for another blast.
There is an equivalent distinction to be made for computational processes. For simplicity, we assume that we are dealing with a binary computer, like most digital computers today. Its programs and data consist of bit strings containing only 0s and 1s. Looking at the computer’s available set of operations, we can classify them into “reversible” and “irreversible” analogous to thermodynamic processes.
Take the logical operator “bit-flip” or NOT for instance. This negation turns a 0 into a 1, and vice versa. This operation is obviously reversible. Most others, however, are not. Think about erasure, which is what Landauer did. Let’s say we start with a random bit string, e.g., 0101100, setting all its positions to zero: 0000000. There is no way we can reconstruct the string we started with from the result. Any random starting sequence will end up as 0000000. This is called a many-to-one map. When we erase data, information is lost, irrevocably. This is the computational equivalent of irreversibility.
Not just erasure, but also standard logical operators like AND and OR, are irreversible (many-to-one) mappings. Take AND, for instance: it maps 00, 01, and 10 to 0, while only 11 is mapped to 1. In this case, we can reverse the operation without loss, but only if the result is 1, not 0. In the latter case, again, information is lost and the process is irreversible. We cannot turn it around.
Landauer’s principle states that any computation that involves logically irreversible functions must also be thermodynamically irreversible. In fact, he demonstrates that each irreversible operation on a bit incurs at minimum the exact same amount of energy (ln 2 kB T) that Szilárd’s demon produced out of its one-molecule machine. What an unlikely coincidence! It allows physicists to test the Landauer energy limit by building experimental approximations of Szilárd’s engine. All such efforts so far have been in tight accordance with the theoretical predictions. The kind of computer you use everyday, however, is very far removed from such maximal energy efficiency. A laptop or smartphone uses orders of magnitude more energy for each processed bit (reversible or not) than these experimental molecular computers.
When all is said and done, it turns out that Landauer’s principle simply is a computational restatement of the second law of thermodynamics. Computers cannot cheat physics (or fate) either. Or so it seemed until Charles Bennett, in 1973, came up with a way to beat the odds. Just like the demon of Maxwell and Szilárd can “cheat” the second law (by investing work into measurement), so can computers. All you have to do to make logically irreversible operations thermodynamically reversible, it turns out, is to store all the intermediate results of your computation. This way, you can retrace your computation step by step. But while storage does not use up any energy in theory, it does so in practice.
And not only that. This accumulation of intermediate storage will get you in trouble with another physical limitation: storage needs space. And space is not necessarily unlimited either, at least not within the observable part of our universe. So now that we know how much energy we need to compute, we can also ask how much computation can be done in a limited volume of space.
This is what Jacob Bekenstein did in 1981. He is the guy who — together with Stephen Hawking — found out that black holes evaporate. He also figured out how much entropy can be squeezed into a given amount of space (assuming there is no black hole in that space). You can interpret entropy, in this context, as a measure for either the free energy or the information available for computation. This limit is now called the Bekenstein bound. Taking the Szilárd/Landauer conversion of information and energy above, we can then calculate how much computation can possibly be performed within the entire bounded volume of spacetime occupied by the observable universe — all the time and space that ever existed, as far as we know (again, excluding whatever lies beyond the event horizon of black holes).
But first things first: since Bekenstein established that only a limited amount of information can be stored in any finite amount of space, we know now that building an actual Turing machine is impossible because it requires an infinite tape. Also, only a limited amount of energy will ever be available for computation in practice, which in turn caps the maximal number of operations that we can perform within the observable universe. And as if this were not enough, there is also an absolute speed limit for the serial processing of instructions, which is connected to the relaxation times of the thermodynamically irreversible processes driving computational devices. All in all, it should be clear: a true universal computer — a real-world Turing machine — will forever remain an abstract idealisation.
But do not fret. We can still do lots and lots and lots of calculations within the bounds set by physics. Physicist Seth Lloyd, for example, imagined in 2010 what it would mean to build what he called the “ultimate laptop.” This device, weighing one kilogram and occupying a volume of one litre, would be able to store about 1020 times as much information as your laptop does right now. It would also be 1035 times as fast. On the downside, it would burn your thighs pretty badly, as it operates at a temperature of one billion degrees Kelvin. That’s more than 50 times as hot as the core of our sun.
If you extrapolate all this to the whole history and size of the observable universe, you get an estimated total information content of about 10^124 bits. This number is not just astronomical — it’s absolutely ridiculous! A one with 124 zeros behind it. Remember that a billion has only nine zeros, and that order of magnitude is already hard to imagine for our puny human brains. In comparison, there are an estimated 10^78 to 10^82 atoms of ordinary (baryonic) matter in the entire observable universe. These atoms are spread across 200 billion (2x10^11) to 2 trillion (2x10^12) galaxies believed to exist, each containing billions of stars and planets. All of this is simply mind-boggling, far beyond our grasp.
But this may not even be all there is. We will discuss later why the actual upper bound for information may be much larger, possibly even infinite, which seems absurd but really isn’t (for reasons we will explain). Anyway, the bottom line is this: the practical limits imposed on computation by physics are not really something you need to lose any sleep about. We won’t hit those limits any time soon.
And yet, it is interesting to know that there are limits, because it allows us to think about specific physical phenomena in terms of how much time and effort they would need to come about. Let’s ask, for example: how likely is it that complex human beings have evolved on planet earth at exactly this point in the cosmological evolution of the universe? This can be framed as a problem of computational complexity. The ideas we’ve just introduced are powerful tools for examining this kind of question.
While the theory of computation asks whether something is calculable by a human being by rote in principle, computational complexity theory asks how difficult it would be in practice. How much time and energy (and sometimes also space, in terms of memory storage) is required to compute a solution to a certain problem? It turns out this is much harder to establish than Turing-computability. Algorithms are tricky beasts, because they are so flexible: you can run them on all kinds of problems and on all kinds of input data. In addition, there are many ways to go about measuring the complexity of an algorithmic process.
In traditional computational complexity theory, we partition the set of computable functions (the one that Turing found and showed to be equivalent to those of Church and Kleene) into sub-classes whose time to termination either grows slowly (or polynomially, as the mathematician would say) with the size of the problem they are solving, versus those for which it grows fast (i.e., exponentially). This is massively important for computer science and its many practical applications, but it is not our main focus here, because we are more interested in the large world of physics rather than the small world of algorithms.
Our problem is related though. We ask: given some object or phenomenon that we can observe, how much computation would be required to simulate it? How much effort to generate it? These are delicate questions and we’ve said already that there are many ways to answer them. Without digressing into a lengthy treatise on the topic, let us just pick out two measures that will be useful later on. What they quantify are slightly different flavours of computational complexity.
One way to go about this is to figure out what the shortest computer program would be that produces the observable object as its output. This measure is called Solomonoff-Kolmogorov-Chaitin complexity, or algorithmic complexity for short. It is notoriously difficult to pin down in practice — an issue we are sadly familiar with by now — because you can rarely rigidly prove that you have indeed found the shortest possible algorithm to produce a given output. The answer depends, among other things, on the formalism or programming language you use to implement your algorithm.
Zooming away from such details, however, it becomes clear that computational complexity has something to do with how much you can compress an algorithm: what we are looking for, after all, is an algorithm that produces a long calculation with as few instructions as possible. This can be achieved, for instance, with loop structures in imperative programming languages, or by recursive function calls in functional programming. If you don’t know what this means, don’t worry! The upshot is: the more regular, redundant, and repetitive your algorithm is, the easier it is to compress it.
The most compressible algorithms consist of just one statement. These are what we call physical laws, expressed by mathematical equations, and we will talk about them some more in the next section. They deal with the most compressible (hence regular) processes in the world. But mostly, we only manage to compress an algorithm by so much, and in some cases, we cannot reduce it at all. The processes such incompressible algorithms simulate lack regularity completely. In other words, they are irreducible. For this reason, computational complexity is often equated with compressibility or reducibility.
There is another, related way to look at the reducibility of an algorithmic process, to assess the computational complexity of its output object. It looks at how much you can shorten the time of execution of an algorithm that produces an observed object as output, i.e., the number of calculating steps needed to arrive at a solution. This is called time complexity. The idea behind it is that complex objects take more effort (and thus more steps) to generate. Charles Bennett, whom we’ve already encountered above, calls this the logical depth of a physical object. Others call it the assembly index, which is just another name for the same thing. It captures what Stephen Wolfram calls computational irreducibility. We will use these kinds of concepts heavily later on, when we discuss ideas about emergence.
But first, let’s take a little breather again! By now, we hope to have given you some intuitions about the limitations of computation in the physical world, and how we can use ideas about computational complexity to examine the effort (in terms of energy, time, and space) that has to go into the generation of certain objects, phenomena, or processes. While this is all very interesting, and will be useful later on, we have only addressed half of the relationship between computation and physics so far. Therefore, it is high time to ask the more difficult question: are the laws of physics, as stated by Newton and his successors, computable? It is to this can of worms that we will turn our attention next.
Computability
Think about it: the laws of physics must be computable, if we want to seriously claim that the universe actually is like some sort of computer. Newton’s oracle must, in some way, be equivalent to Turing’s machine. And remember that we do not take the laws of physics as foundational, because they are only a map and not the territory, but we are still trying our best to stay compatible with the very robust knowledge of the world they provide. So, is physics computable then? And what does that even mean?
Let us recall that Newton already used the verb “to compute” when talking about predicting future states from his famous laws. So, “computation” surely has something to do with prediction. But is this the same as Turing’s idea of “effective computation”? Well, almost, but not quite. And also: where do contemporary theories of physics stand in that regard? After all, it’s been more than 300 years since Newton published his Philosophiæ Naturalis Principia Mathematica.
This question is relatively easy to answer for one of the two foundational theories of modern physics: Einstein’s general relativity. Even though offering a completely different (meta)physical view of the world compared to Newton’s, with its spacetime curved by the force of gravity, the two theories do not differ much in terms of their computability. There are two main problems we need to deal with here, and both of these have to do with the concept of infinity.
To be more precise, the underlying issue is actual infinity: the question whether infinite quantities are actually realisable in a physical universe that may or may not be infinite. In fact, this not only concerns physics, but also mathematics itself: there is a school of thought, called the intuitionists, that says if you cannot construct a mathematical proof, step by step and in practice, then you cannot be sure it is actually valid. And since we cannot construct infinities, say, by adding the number one to itself forever and ever in order to build the infinite set of natural numbers, we cannot use infinities in intuitionist proofs. Sounds reasonable, you say! But the snag is: infinities are very common in mathematics.
And in physics too! The first thing that probably comes to your mind is the question whether the universe itself is infinite or not. The answer is: probably, but we’re not quite sure.
To start with, the observable part of the universe today seems finite in extension. It forms a bounded four-dimensional bubble of spacetime that has been expanding ever since the postulated singularity of the Big Bang, about 13.8 billion years ago. If you look up into the night sky, you will not only see objects like stars and galaxies that are further and further away in space but, due to the universal limit on the speed of light, you will also see objects that are further and further in the past. And the background of the sky isn’t really black: it faintly “glows” at a temperature of 2.725 degrees Kelvin above absolute zero. This ice-cold “glow” emits radiation in the microwave range, the kind you use to warm up food. That’s why it’s called the cosmic microwave background. And the amazing thing is: it is a snapshot of the moment the universe became transparent to electromagnetic radiation (including visible light), roughly 379,000 years after the Big Bang, more than 13 billion years ago.
It is difficult to “see” anything beyond this point in time, but there is another signal, the cosmic neutrino background, that comes from an even earlier moment (a mere second after the Big Bang). This, as far as we know today, sets a pretty strong limit to how far back in space and how far out in time we can detect anything. There is nothing to be seen beyond these two background curtains. Therefore, we think today that the observable part of the universe is pretty darn big but probably not infinite.
Yet, as always, the story is more complicated than that, because the universe did not always expand at a constant rate. Many physicists believe that it did something really weird in the first 10^-36 to 10^-32 seconds of its history: they speculate that spacetime inflated at a mind-boggling rate during that period, far beyond the volume that we can see today. This hypothesis, if verified some day, would explain why the geometry of the vacuum is flat — that is, without curvature (unless there is matter to distort it).
On top of that, the best-fitting model of Big Bang cosmology we have today suggests that the universe is recently expanding at an increasing rate. This phenomenon is actually measurable. All of this together means that there is a lot of spacetime out there that is no longer detectable for us, because it has receded behind the event horizon of light that can still reach the earth within the age of the universe. And that unknowable part of the cosmos may as well be infinite. We may never know for sure.
But even if the whole universe is infinite in this way, anything that will ever matter to us human beings — anything that is capable of causally interacting with us — must lie within the limits of the observable universe. What’s outside is out of reach, forever and always. Does that mean we live in a finite world?
Not at all! Infinities also lurk in our immediate vicinity, all around us, all the time. And those are the infinities we really need to worry about if we want to understand whether the world is computable or not. They do not concern the infinitely large, but the infinitely small. They revolve around the question whether space and time are smooth and continuous, or whether they are made out of separable chunks.
These are issues that are very much still debated among physicists today. There are those who claim the world is fundamentally subdivided into discrete units. John Wheeler, for instance, suggested that space beyond the Planck scale of about 10^-35 metres consists of a “foam” of discrete particles that constantly pop in and out of existence. Others have suggested that time is also quantized this way, at very short time scales of about 10^-43 seconds. But these ideas remain speculative.
The theory of general relativity, in stark contrast, treats space and time as a smooth continuum (outside black holes and the Big Bang itself, that is). And so does Newton’s classical theory of gravitation. This poses a problem for computability: these are theories of continuous, i.e., smooth, change from the past to the present to the future. They subdivide time and space into the infinitesimal intervals of Newton’s (and Leibniz’) calculus. The infinitesimal is the exact opposite of a large infinity: a numerical increment that is infinitely small, while still being not quite zero. Yes, the idea makes our head hurt too.
But it’s actually not so hard to get a feeling for it: in order for anything to move smoothly through time and space, it has to be moving by some distance at every given moment. If it isn’t, we end up with Zeno’s arrow paradox. About 2,500 years ago, Zeno of Elea pondered that an arrow shot by an archer towards its target must take up a well-defined volume of space at any instant. But if it does that, then it cannot be moving at that instant (otherwise the volume it occupies is not well-defined). And because it is not moving at any instant, it can never reach its target ― in fact, it cannot really be moving at all!
This is what philosophers call a veridical paradox: there is no logical contradiction here, just an apparent absurdity that is evidently true nonetheless. The arrow does reach its target. We can observe it doing so. So there must be something wrong with the way we are thinking about the whole thing.
The abstract concept of an infinitesimal solves this problem, at least theoretically. Think of the distance the arrow moves, across a given time interval: you can measure its position at the beginning and end of the interval. The difference is the distance travelled. Simple! If you divide this distance with the length of the time interval, you get the average velocity of the arrow over the interval you are considering.
Now imagine you shrink this time interval. Let’s say you halve it. Again you measure the position of the arrow at the beginning and end, and you divide by the length to get the average velocity. You then repeat this procedure over and over again. After an infinite number of halvings, you have shrunk the interval down to an instantaneous moment (a time interval with length zero). You can no longer calculate the velocity of the arrow at this point, since division by zero is not allowed.
But you can follow the measured velocities, as you keep halving the interval, and you will realise that they converge towards a certain number as you approach the instantaneous moment, the interval of length zero. Mathematicians call this the limit of the sequence of ever-shorter intervals. It represents the instantaneous velocity of the arrow, defined over an interval of infinitesimal length. Nifty! The arrow does move, while still occupying a well-defined volume of space at any given instant. Problem solved! It seems you can have your cake and eat it.
But not so fast! Our original problem still keeps bugging us: can this be real? In the natural world, with its limited time and resources, does it make sense to talk about infinitesimals and infinities as if they actually exist? Or are they mere abstractions, useful features of the map but not the territory?
The conclusion for us is as simple as it is obvious: neither classical nor relativistic mechanics are strictly computable in the sense of Turing! The problem is that algorithms do not deal well with infinities, but both of these theories are built around Newton’s calculus with its infinitesimal. Computationally, we can only approximate but not really capture an exact solution. The problem is that our algorithm must terminate in a finite amount of time to produce an output in order to yield a solution. Yet, calculating a mathematical limit exactly requires us to take an infinite number of steps. Smooth change, therefore, is beyond computation ― by its very definition.
But hold on. Not so fast! First of all, this is not always true. There are many cases where we can solve the equations that describe the rules of change in a mathematically explicit way. For example, if a growing population of bacteria doubles every twenty minutes or so, then their growth follows an exponential function, one of these maximally compressed expressions that are very easy to calculate! Indeed, there are many rules of change that can be turned into simple mathematical formulas this way. Mathematicians call this operation integration. It is one of the most common tools used in maths and physics, and it is proven to be computable! How cool. Problem solved?
Not quite, unfortunately. The cases where explicit integration is possible are actually few and far between. They succeed for the lawlike processes mentioned above, which are often those we are most interested in, where the description of the dynamics can be compressed into a single formula. In many cases, however, this is not possible. At least not yet. And we’re not at all sure if it ever will be.
When explicit integration fails, we are forced to integrate the equations of change numerically, that is, by computational approximation. To do this means to discretize the continuous time and space underlying the original formalisation: we turn them into individual chunks, however small they may be. And this works perfectly well. In fact, we can establish mathematically that we can approximate any well-defined rule of change to an arbitrary degree of precision. That’s quite neat. You won’t ever reach infinity this way, but you can get as close as you’d like.
For all practical purposes, this should be good enough, especially considering that our ability to measure anything in nature is limited anyways, so we should never require infinite precision in our calculations for physical problems. But what about chaos, you may ask? Deterministic chaotic systems have dynamics that depend on the exact value of real-number-valued parameters. This is the second place where the problem of small infinities pops up: infinitesimal changes in the parameters of chaotic equations can lead to radically different outcomes. Take the butterfly effect, for example: a lepidopteran creature beating its wings in Brazil one day can cause a hurricane on the U.S. coast several weeks later. Our weather patterns are chaotic. So we really hit a bump here: even if we could calculate at infinite precision, we could not predict (i.e., compute) the future states of a system such as the weather indefinitely because we cannot measure the relevant physical quantities accurately enough.
To conclude this part of the argument: classical and relativistic dynamics turn out to be kind of Turing-computable (with the exception of chaotic systems). And by “kind of” we mean that, to calculate a physical system precisely, you’d need an infinite number of computation steps. So we’ve basically dropped Turing’s restriction that algorithms have to terminate in finite time. Most people (including us) would agree that this is okay for research in practice, but it does mean we’re no longer talking about exactly the same set of calculable functions that Turing identified. Instead, we’ve relaxed our conditions to include additional functions: those that take an infinite number of calculation steps to solve precisely.
The situation is a bit more complicated for the second foundational theory of contemporary physics: quantum mechanics. Are quantum processes computable? Again, it depends on how you look at them. If you take the wave equation, which lies at the heart of quantum theory, you will notice that it describes continuous dynamics, just like Newton’s classical or Einstein’s relativistic equations of motion. At this level of abstraction, the same discussion as above applies. But there are difficulties lurking underneath.
The first arises from the fact that the wave equation does not describe the trajectory of a single sequence of states through time, like classical or relativistic equations do, but the evolution of an entire probability distribution over a range of states. The wave equation won’t tell you whether Schrödinger’s cat is alive or dead, only what the chances are that you will recover it living or deceased once you look inside the box. This leads to a funny situation: as long as you don’t look, the cat is famously both dead and alive. This is called quantum superposition. In terms of the dynamics of the system, it means you need to trace both states (“alive” and “dead”) over time and assign probabilities to both of them.
This is like performing several computations in parallel, at each step picking a certain operation randomly with a certain probability. We can model this kind of constantly branching and merging computational process with what is called a nondeterministic Turing machine. The correspondence is not perfect, we should say, but it's close enough for our purposes. An alternative name for this paradigm is multicomputation. Some people claim it differs from the classical concept of computation, but this is not true: Turing himself proved that, using a universal nondeterministic Turing machine, you get the exact same set of computable functions as in the deterministic (classical) case. Likewise, quantum computing does not allow you to suddenly calculate non-computable problems.
What it can do, however, is make specific calculations faster (to shorten their minimal path to solution), so that they no longer take an exponential amount of time. This only works under exceptional circumstances. For instance, it has been demonstrated for the factorization of integers which is of practical importance for online cryptography. But we need not worry here: whether or not we use a deterministic or nondeterministic Turing machine, or whether we use a classical or a quantum computer, the definition of what is effectively computable remains exactly the same.
A more serious quantum-specific problem, however, is true randomness. And it is highly relevant for the argument about determinism we had a while back. Remember the difference between weak and strong determinism. In the strong version, the future is completely predetermined by the present. Laplace’s demon rules! In the weak version, there can be nondeterministic effects, as long as every outcome has at least one cause that we can recognise in retrospect. Quantum randomness gives you exactly this kind of indeterminacy. In the case of Schrödinger’s cat, the survival of the feline depends on the radioactive decay of a single atom in a closed box. This is an event whose exact occurrence most physicists believe to be truly unpredictable. You can only say “x atoms of substance y will decay on average over a given period.” But you cannot predict when exactly a specific atom decays. Not even in principle. Not ever.
We are still debating whether such microscopic randomness plays any role in the behaviour of macro-level phenomena. But nothing says it is impossible. Schrödinger’s thought experiment, for instance, shows us how it could be done. Instead of a suffering cat, we can hook some computational device to the decaying atom, which takes the decay event as an input. Since we cannot predict when exactly the decay happens, the outcome of that computation will be fundamentally indeterminate. But this does not help us calculate non-computable problems either. In fact, it computes nothing at all! True randomness cannot be put to use that way, because it is, you know, truly unpredictable. And we’ve already established that computation is linked to our ability to predict the outcome of some process.
In conclusion, quantum mechanics introduces an element of true indeterminacy into the bottom layer of our physical theories of the world. But this won’t help us get any computational work done, because quantum randomness is neither predictable nor reproducible. It never happens the same way twice.
When it comes to the computability of modern physics, then, we can conclude that there are clear limits: chaotic deterministic systems (or any other continuum model which requires infinite precision) and quantum randomness represent theories that transcend the traditional notion of computability. In practice, however, these limitations don’t really matter, as our measurement precision is always limited anyway, and we can approximate mathematical solutions to our real-world problems to an arbitrary degree of precision by numerically integrating equations of change, or by averaging over large enough samples of random quantum events as we do with the wave equation. Newton’s universal oracle, then, is indeed equivalent to a Turing universal machine, as long as we let the latter go on without fixed restrictions ― as long as it takes to deliver an accurate-enough solution.
Computationalism
Are you still with us? We hope so. If your head is spinning: that is to be expected. After all, we just ran an intellectual marathon at the speed of a 100-yards sprint! It’s high time to recapitulate what we’ve covered so far in order to regain some orientation.
We started this chapter by introducing a model of a peculiar human activity (calculating by rote) that turned out to be a model of a universal machine. This model underlies pretty much all of our current digital computer technology, and therefore our digital civilization and lifestyle. We have shown how the laws of physics, as well as the estimated size and age of our universe, naturally limit our ability to compute, but not to any extent that will hamper us in practice. Basically, we can compute all we want! In particular, we can compute predictions based on the known laws of physics to arbitrary (but usually not infinite) precision, with the possible exception of chaotic systems and quantum randomness.
This is where we are: equipped with an (almost) universal tool for prediction and control. What promethean power indeed! We can simulate pretty much any physical process we can imagine with a computer. Does this mean the universe actually behaves like a universal computer? This claim no longer seems far off. But notice: it is not the same to say “the universe is simulable” and “the universe is (like) a computer.” The former is an empirically testable prediction, the latter a metaphysical stance.
Philosophers call this speculative worldview pancomputationalism. It is Greek for the idea that “everything computes.” If you are lazy you can leave out the “pan” and just say computationalism.
Historically, the first person to propose this view was German computer pioneer Konrad Zuse, with his argument called “Rechnender Raum” (“Calculating Space”) published in the late 1960s. We have encountered (but not explicitly named) it before, and have pointed out that it provides the modern equivalent of Laplace’s demon: a strong kind of determinism, even if it may not be transparent to a limited observer within the universe itself.
Many variants of this idea have been proposed in the past 50 years or so, ranging from quite mild and vague versions that simply equate “physical change” with “computation” in some broad sense that does not correspond to Turing’s notion, to the extreme ideology of the “simulation hypothesis,” which claims that we all live in an actual digital computer simulation, with the simulator quite literally being Laplace’s god-like demon. We do not take this hypothesis (or the people who are pushing it) seriously, and do not really see the point of even formulating such a pseudo-religious mythology.
Instead, we will focus on a more rigorous scientific formulation of the same idea here. It mainly follows arguments made by physicists Rolf Landauer, John Wheeler, David Deutsch and Seth Lloyd.
At some point after his initial work on the physics of computation, Landauer started interpreting his principle in the sense that information must be physical, because of its interconvertibility with energy. Notice the subtle switch: he started with an investigation into what would happen if you did not consider computation in the abstract but as a physical process, and he ended up claiming that computation in general is a physical phenomenon, which is a much stronger and broader claim.
Wheeler took this to the next level. With his famous slogan “it from bit,” he illustrated the idea that matter and energy — often considered the fundamental ingredients of the universe by physicists — themselves are based on a deeper immaterial principle. Whether or not something exists materially, on his view, can always be reduced to an either/or question. It is in this way, Wheeler thought, that the material universe derives from information, which lies at the very bottom of all our explanations. The equivalence between information and energy/entropy is no superficial semblance, but true identity. The laws of physics are abstract approximations to a more discrete reality: the world is literally made from bits!
Deutsch, some time before Wheeler’s speculations in the 1980s, took a very different route: he declared the Church-Turing conjecture to be a physical principle. We now call this the Church-Turing-Deutsch principle. It is not at all the same as the original Church-Turing conjecture, differing from it in several crucial aspects. First, it swaps Turing’s “effective computation” (a human ability) with “physical realizability” (a general characteristic of physical processes, living and nonliving). Second, it considers quantum computing, and not classical computation in the strict sense of Turing. We have seen above that this does not alter the resulting set of computable functions though, just the computational efficiency with which specific problems can be solved. Thus, Deutsch’s new principle reads: “every process that is physically realisable is quantum-computable.” From this, he constructs an equivalent to the universal Turing machine: the universal quantum computer. It is a machine that can perfectly simulate all finite actualizable physical processes (including all possible quantum computers and Turing machines).
But why does Deutsch call this a “principle” and no longer a “conjecture?” Just like the Church-Turing conjecture, it is not mathematically provable. Nor can it be directly tested empirically: how can you exclude the possibility that there are physical processes which are not computable? It is not clear. Would we even recognise such processes as phenomena worth investigating? For reasons that will become clear later, this is highly doubtful. But this does not mean that such phenomena do not exist. Remember the computational device driven by single radioactive decay events above? Schrödinger’s computer, basically. This particular quantum machine is only computable in terms of its average behaviour. Its specific trajectories are physically realisable but not computationally predictable at all.
But this is missing Deutsch’s point. The reason he calls his statement a “principle” has nothing to do with it being proven or validated in any way. This is a bit confusing, but technically okay. What physicists mean by a “principle” is some sort of very general proposition that generates specific theories that are empirically testable. Take, for example, the conservation of energy and angular momentum underlying classical mechanics. These conservation laws are principles in the sense that they can be derived from fundamental symmetries in the equations describing the system (via Emmy Noether’s theorem). They enable empirically testable hypotheses by constraining the particular theories and models they are derived from. In the same way, the Church-Turing-Deutsch principle is supposed to delimit what a physically realisable process is in the first place. As in Wheeler’s view, computation dictates reality, not the other way around.
Lloyd takes all these arguments to their ultimate conclusion. He conjectures, echoing Galileo, that nature “speaks a language whose grammar consists of the laws of physics.” But unlike Galileo, he does not believe these laws are written in the formalism of mathematics. Instead, they are algorithmic, and the entire universe literally is one gigantic quantum supercomputer. This expresses the essence of pancomputationalism. From “computation can be physical” we’ve come a long way to arrive at the astonishing conclusion that the world is (literally, not metaphorically) a computing machine.
The details of these fascinating arguments became increasingly blurred as pancomputationalism gained wider popularity among researchers and laypeople over the past few decades. At the same time, its proponents became increasingly radical in their claims. And so, inevitably, their followers started mixing up Church-Turing with Church-Turing-Deutsch, simulation with physics, computer with world, map with territory.
In sum, this is what pancomputationalism implies: (a) all physical processes are ultimately discrete, (b) the universe is determined by a finite set of discrete rule-based dynamics, and thus (c): all physical processes are engaged in some sort of computation. Stating these ideas in such an explicit way, and considering the twists of history we’ve just outlined, it should be easy to see that all of these claims are problematic.
Concerning (a) and (b), we simply don’t know. These claims may or may not be true. But, as we have already argued above, it will be tough to demonstrate that they really are.
Think about (a): quantum mechanics may or may not imply discretized time and space. Physicists are vigorously debating this issue as we speak. We will only know more once we have that famous missing “theory of everything,” unifying quantum mechanics with relativity. And even this may not do the job: what reason do we have to exclude the possibility of a hidden continuum underneath even the most fundamental discrete spacetime? It’s imaginable, so it’s logically consistent. Our measurement precision is forever limited and it is reasonable to assume that we may never get to the true bottom of it all.
The same issues affect (b): we can never exclude the possibility that we missed some non-computable phenomena, maybe because we cannot even identify them, or recognize them as phenomena of interest. Consider quantum indeterminacy: if it's truly random, why study it? There is nothing more to be said than “it’s random,” and it is not going to be practically useful in any way. More importantly: if we assume the world is only weakly deterministic, then we can reconstruct all the rules that applied in the past, but we cannot predict those that are yet to come. We will come back to this central point later.
The bottom line is: we should have very little confidence that (a) or (b) are actually true at this point. We certainly have no mathematical proof or empirical evidence indicating that they are. But the pancomputationalist view absolutely depends on them! If any of the two turn out to be wrong, then pancomputationalism is wrong. And we already have good reasons to believe it is. Think of quantum randomness, plus more evidence at the macro-scale which we will give you soon.
But we haven’t even looked at (c) yet, which is where pancomputationalism’s troubles really begin. The claim that all physical processes are some kind of computation is not at all the same as saying “we can simulate all physical processes to arbitrary accuracy on a computer.” It is a strong metaphysical claim. Consider that all well-defined notions of computation are concerned with the manipulation of symbols or representations. We’ve already encountered Turing’s original definition of “effective computation” as a human problem-solving activity. More generally, we could follow philosopher Corey Maley and define physical computation as the mechanistic manipulation of (symbolic) representations. This has the advantage of not only including digital but also analog forms of computation. What both definitions have in common is that there must be some kind of symbolic content that is being manipulated.
We humans compute in order to solve problems that are relevant to us. This is what Turing’s theory is all about. So what on earth would it even mean to say “a physical process performs some kind of computation?” One potential response goes along the following lines.
Consider a waterfall. It is a physical system. We can delimit it by saying that a certain volume of water goes in, distributed in a certain way across space and time, and a certain volume of water comes out, again with a peculiar spatiotemporal distribution. In this sense, the waterfall can be said to map its input water flow, down a cliff, to its output water flow. Differently distributed inputs will be reliably associated with differently distributed output streams. In an abstract sense, this is equivalent to problem-solving in the sense of Simon and Newell: there are clearly defined inputs and outputs, with specific (physically possible) trajectories or mappings between them. And that, as we have seen above, is equivalent to Turing computation.
Now consider the game of chess, a small-world problem we have already introduced. It is computable, although exponentially complex: there exist algorithms that would give you guaranteed predictions of the game’s outcome, if you only had enough time to calculate them. The question is: can we use the waterfall to compute the problem of chess? And if we can, does that not show that any kind of flow down the waterfall must be performing some kind of computation?
The answer to the first question is “yes:” we can simulate the game of chess in terms of our waterfall, although it would be quite cumbersome to implement. For instance, we could create a maze of channels and constraints that guide little water flows in varying patterns. We could design the maze such that specific inputs and outputs are correlated in the same way specific sequences of moves fit together in the game of chess. Nothing says this isn’t possible. The mapping is arbitrary and flexible enough. Therefore, chess is computable by waterfall!
Still, you may be surprised to hear that the answer to the second question is “no:” the problem is that we have not gained anything by considering the flow of water down our waterfall as a computational process. We could have solved the problem of chess instead by simply playing chess! In fact, that would have been much easier. And anyway: the water does not care or know about chess. We’ve imposed our maze on the flow of water, arbitrarily from the “point of view” of the waterfall. Basically: it is not the waterfall that computes by itself, it is us human beings who may use the waterfall to perform a calculation. There is no intrinsic sense in which the waterfall computes. Its natural flow represents or symbolises nothing but itself.
This is the real reason why it makes no sense to call physics “computation:” these two phenomena are of a fundamentally different kind! Physical processes can be conceptualised and modelled by us in terms of matter, energy, or information. But conceptualising a physical system in terms of information flow, simulating it on a computer, is something we impose (or impute) on the physical phenomenon. It is not somehow an intrinsic aspect of the physical dynamics.
More precisely: computation only happens in the context of living agents, because it implies the manipulation of symbolic content. Symbols require an interpreter. The waterfall is not that. To equate physics and computation you have two options: either you redefine “computation” to simply mean “physical change” — but why not just call it “physics” then? This seems superfluous. Or you commit what philosophers call a category error: you mix up apples and oranges. And this is not proper scientific or philosophical practice. Computation is not physics. Not even close.
Pancomputationalism, therefore, is not a logically sound or consistent view of the world. This does not mean that computational models aren’t useful. Not at all! In fact, we have learned over the course of this chapter just how powerful computer simulations can be. We will keep on using them in the rest of the book. But while they are a good method to study physics, they are not the same as physics! Map and territory. Always remember to distinguish the two. Or disaster and delusion will befall you.
This concludes the first part of our book: our criticism of the machine view of the world. In the last five chapters, we have covered a lot of ground, arguing that mechanicism is an ideological stance, not a rational or scientifically proven approach to understand our large world. We have revealed the philosophical baggage that comes with the machine view, and computationalism as its most recent incarnation. It is not more or less justified than other well-argued naturalistic philosophies.
In fact, its underlying assumptions severely restrict the way we approach the world: they narrow the range of questions we can ask, limit the kind of explanations we accept, and often prevent us from drawing the right conclusions. They make us ignore phenomena that are right in front of our nose, but do not fit into its Procrustean bed. They lock us into our own map, and make us blind to the territory. Instead of seeing the landscape clearly, we are trapping ourselves in ever more sophisticated models of the world, increasingly losing our grip on reality.
It is a simple truth: the machine view is no longer the most rational way to do science in a large world. Not at all. It is a conceptual cage, an intellectual straightjacket. We must escape it! Grow out of it. Transcend it. Urgently. A new kind of worldview is needed for science in the 21st century. But danger lies that way. Your worries are justified: it is easy to drift into woo and wishful thinking when abandoning our hard-nosed mechanistic ways. We see it happening all over the place, and it’s not a good look. We don’t want to abandon scientific thinking. We want to make it even better than it already is! We need science more than ever in this post-factual world of global delusion.
What we need, therefore, is a view of the world that can outcompete the machine view in terms of both rigour and robustness. And we think we have such a view. It is firmly grounded. It is not based on declared dogma or outdated metaphors, like the machine view, but is flexible and adaptive instead. It acknowledges human limitations, while still striving to give us the best knowledge we can obtain. It stays up-to-date with the ever-evolving insights coming from science itself. It takes our experience of phenomena seriously, our actions within our large world, as the source of all knowledge. It keeps building from the bottom up, rather than the top down. It is called scientific perspectivism, and will be the focus of the second part of this book.
Previous: Mechanistic Maps
|
Next: The Lost Narrative
|
The authors acknowledge funding from the John Templeton Foundation (Project ID: 62581), and would like to thank the co-leader of the project, Prof. Tarja Knuuttila, and the Department of Philosophy at the University of Vienna for hosting the project of which this book is a central part.
Disclaimer: everything we write and present here is our own responsibility. All mistakes are ours, and not the funders’ or our hosts’ and collaborators'.
Disclaimer: everything we write and present here is our own responsibility. All mistakes are ours, and not the funders’ or our hosts’ and collaborators'.