Containers
Set theory is boring. It’s a fair bet that this is what you think. Unless you are a mathematician, you may remember set theory from school as the kind of maths that really abandoned the familiar territory of counting with numbers, and took you into a world of strange formulas whose only point seemed to be to prove that something is, or is not, in fact, part of something else.
But this is really not doing it justice. If you read this appendix, you will not only better understand the mathematical concepts and notation we use in the main text, but also learn the basics of what set theory actually is, and also what it is good for, and why it’s actually really fascinating.
Set theory is fundamentally important for two main reasons. On the one hand, it is the theory that many mathematicians see as the very foundation for all of mathematics. On the other hand, it tells us a lot about how most of us tend to think, not just about abstract objects and relations, but about the world in general. We mention in chapter 11 that set theory represents the very deeply ingrained psychological doctrine of containment, which is the view that the world is made of lots of containers that contain further containers, and so on. This view is not only very common among modern humans, but lies at the heart of our logical reasoning, and also develops at a very early stage in our lives. Even kids are natural set theorists! Not that they know this, of course.
Set theory, and this is important, starts with the basic idea that there are distinguishable stable entities, abstract or concrete objects, which we denote with the universal mathematical variable name x. We can then sort these x-entities into different containers (like bags) that we call sets (or sometimes collections). Except that these containers are completely immaterial: there is nothing the “bag” surrounding the elements of a set is made of (even abstractly speaking). Instead (as already mentioned in chapter 11) sets are defined purely in terms of the objects they contain, which we call the members or elements of the set. If we denote the set by X (with a capital letter, to set it apart from its x-elements), and use curly braces to indicate the (immaterial) container, we can now write
Set theory is boring. It’s a fair bet that this is what you think. Unless you are a mathematician, you may remember set theory from school as the kind of maths that really abandoned the familiar territory of counting with numbers, and took you into a world of strange formulas whose only point seemed to be to prove that something is, or is not, in fact, part of something else.
But this is really not doing it justice. If you read this appendix, you will not only better understand the mathematical concepts and notation we use in the main text, but also learn the basics of what set theory actually is, and also what it is good for, and why it’s actually really fascinating.
Set theory is fundamentally important for two main reasons. On the one hand, it is the theory that many mathematicians see as the very foundation for all of mathematics. On the other hand, it tells us a lot about how most of us tend to think, not just about abstract objects and relations, but about the world in general. We mention in chapter 11 that set theory represents the very deeply ingrained psychological doctrine of containment, which is the view that the world is made of lots of containers that contain further containers, and so on. This view is not only very common among modern humans, but lies at the heart of our logical reasoning, and also develops at a very early stage in our lives. Even kids are natural set theorists! Not that they know this, of course.
Set theory, and this is important, starts with the basic idea that there are distinguishable stable entities, abstract or concrete objects, which we denote with the universal mathematical variable name x. We can then sort these x-entities into different containers (like bags) that we call sets (or sometimes collections). Except that these containers are completely immaterial: there is nothing the “bag” surrounding the elements of a set is made of (even abstractly speaking). Instead (as already mentioned in chapter 11) sets are defined purely in terms of the objects they contain, which we call the members or elements of the set. If we denote the set by X (with a capital letter, to set it apart from its x-elements), and use curly braces to indicate the (immaterial) container, we can now write
and also
i.e., “x is an element of X” or simply “x is in X”. There can be multiple and various kinds of x in the set. And again: these individually distinguishable elements define the set by extension. We can make this more explicit by writing different elements using distinct letters x and y. Then we also see that it is easy to combine two elements into a new set Y
where it is important to note that {x,y} and {y,x} are one and the same set. The order of the elements doesn’t matter. The only thing that’s relevant is what the elements are, and that they’re part of the set. Thus, {x,x,y} is also the same set as {x,y}. Once again, we only count elements that are distinct.
Therefore, the only property you must have to qualify as a set element is to be distinguishable from other set elements. Apart from this, elements can be literally anything. They don’t have to be measurable quantities, for one. We can sort cars according to colour, for example, generating a set of yellow cars, a set of red cars, and a (much bigger) set of dull-metallic-grey-like cars. Similarly, we can have sets of apples and oranges, of different main-sequence stars, of ideas (we’ll soon describe a set of related paradoxes, for example), and even sets of processes, such as sets of chemical reactions, of physical phase transitions, of biological life histories, or of abstract system behaviours (connected by sets of bifurcations, i.e., different kinds of transitions between these behaviours). Set elements can even be completely made up: it is no problem to define a set of Odradeks, or of alephs, for instance. We really don’t care what exactly is in a set.
Plus: members of a set can be sets themselves. We sometimes call such sets families of sets, when we want to make it clear that the elements are sets. Great sets have little sets inside themselves, and little sets have lesser sets, and so ad infinitum. This will create some complications further down the line, as we’ll see. But we’re getting ahead of ourselves.
Even though a set (as a container) does not consist of anything, it can happen that a set contains no elements. In fact, since a set is defined by its content, there is only one set that is like that, called the empty set, written like this: {}. It may seem like a strange entity: an immaterial bag with nothing in it. Just like the number zero, the mathematical concept of the empty set is a signifier for something that is really not there. In fact, we will show shortly that the two are closely related to each other in a very deep way.
Another basic assumption of conventional set theory is that an element either belongs to a set, or it doesn’t. This clear-cut relation of belonging between an element and a set is central, and reflects the logical law of the excluded middle. A car is either yellow, or it isn’t. Either you are an Odradek, or you aren’t. It’s true: we can build any arbitrary set by simply putting a bunch of elements in a bag. But if we think about this a little more, we realise that members of a set must be united by at least one property that they all share. Even if they didn’t have anything in common before you put them in the bag, these elements now all have the property of “belonging to X.” This may sound trivial, but it is exactly what we meant when we said that a set is defined by extension above. In practice, however, it isn’t very exciting: we end up with a completely random collection of elements in our bag. What good is that?
More useful sets consist of elements that share properties beyond the trivial “belonging to the same set.” This immediately gives us an effective recipe, or procedure, to build new sets from scratch. It is called set comprehension, and involves defining sets by what mathematicians call intension (which, confusingly, has nothing to do with intention, so be careful not to confuse the two). In this case, we do have a property (such as “yellow colour” for our car example) before we build our set. We now simply declare that all elements x of X with property P belong to our new collection:
Therefore, the only property you must have to qualify as a set element is to be distinguishable from other set elements. Apart from this, elements can be literally anything. They don’t have to be measurable quantities, for one. We can sort cars according to colour, for example, generating a set of yellow cars, a set of red cars, and a (much bigger) set of dull-metallic-grey-like cars. Similarly, we can have sets of apples and oranges, of different main-sequence stars, of ideas (we’ll soon describe a set of related paradoxes, for example), and even sets of processes, such as sets of chemical reactions, of physical phase transitions, of biological life histories, or of abstract system behaviours (connected by sets of bifurcations, i.e., different kinds of transitions between these behaviours). Set elements can even be completely made up: it is no problem to define a set of Odradeks, or of alephs, for instance. We really don’t care what exactly is in a set.
Plus: members of a set can be sets themselves. We sometimes call such sets families of sets, when we want to make it clear that the elements are sets. Great sets have little sets inside themselves, and little sets have lesser sets, and so ad infinitum. This will create some complications further down the line, as we’ll see. But we’re getting ahead of ourselves.
Even though a set (as a container) does not consist of anything, it can happen that a set contains no elements. In fact, since a set is defined by its content, there is only one set that is like that, called the empty set, written like this: {}. It may seem like a strange entity: an immaterial bag with nothing in it. Just like the number zero, the mathematical concept of the empty set is a signifier for something that is really not there. In fact, we will show shortly that the two are closely related to each other in a very deep way.
Another basic assumption of conventional set theory is that an element either belongs to a set, or it doesn’t. This clear-cut relation of belonging between an element and a set is central, and reflects the logical law of the excluded middle. A car is either yellow, or it isn’t. Either you are an Odradek, or you aren’t. It’s true: we can build any arbitrary set by simply putting a bunch of elements in a bag. But if we think about this a little more, we realise that members of a set must be united by at least one property that they all share. Even if they didn’t have anything in common before you put them in the bag, these elements now all have the property of “belonging to X.” This may sound trivial, but it is exactly what we meant when we said that a set is defined by extension above. In practice, however, it isn’t very exciting: we end up with a completely random collection of elements in our bag. What good is that?
More useful sets consist of elements that share properties beyond the trivial “belonging to the same set.” This immediately gives us an effective recipe, or procedure, to build new sets from scratch. It is called set comprehension, and involves defining sets by what mathematicians call intension (which, confusingly, has nothing to do with intention, so be careful not to confuse the two). In this case, we do have a property (such as “yellow colour” for our car example) before we build our set. We now simply declare that all elements x of X with property P belong to our new collection:
which reads as “Xₚ is the set that contains all x that are elements of a (larger) set X, and also have a certain property P.” Such sets (defined by a common property P of elements x) are sometimes called classes. On the one hand, how we formulate the property P(X) is quite flexible (including various combinations of properties), which renders this set-theoretical tool so powerful that we really can’t live without it. On the other hand, it opens the door for all kinds of trouble, as we shall soon see.
One rather modest kind of trouble is that we can now easily construct sets with an infinite number of elements. This is impractical when defining sets by extension, i.e., by explicitly enumerating all their elements. Evidently, it takes an infinite amount of time to enumerate infinitely many elements. But intensional set comprehension allows us to generate infinite sets without effort, in an instant. As an example, think of the set of all even numbers, which are drawn from the set of all natural numbers or integers, and share the simple and well-defined property P that x is divisible by 2. Other examples are easy to come by. But this isn’t really problematic at all, other than our “bag full of elements” visual metaphor slightly breaking down when we try to imagine an infinite number of entities in a bag. In fact, infinite sets are a standard concept in set theory, and we’ll come across many more of them in our discussions.
Let us now have a look at how sets relate to each other: as we already said, sets can be elements of other sets. But they can also be subsets, which is something we actually use in the main text quite a lot. The two are not the same! While an element x is an object that belongs to set X (and helps define it, in fact), a subset A is a different set (a different bag of elements) than X that stands in a particular relation (called inclusion) with X. It just happens to be the case that all elements of A are also elements of X. So you can picture a subset not as an entity (an object) but as another bag inside the original bag. We write
One rather modest kind of trouble is that we can now easily construct sets with an infinite number of elements. This is impractical when defining sets by extension, i.e., by explicitly enumerating all their elements. Evidently, it takes an infinite amount of time to enumerate infinitely many elements. But intensional set comprehension allows us to generate infinite sets without effort, in an instant. As an example, think of the set of all even numbers, which are drawn from the set of all natural numbers or integers, and share the simple and well-defined property P that x is divisible by 2. Other examples are easy to come by. But this isn’t really problematic at all, other than our “bag full of elements” visual metaphor slightly breaking down when we try to imagine an infinite number of entities in a bag. In fact, infinite sets are a standard concept in set theory, and we’ll come across many more of them in our discussions.
Let us now have a look at how sets relate to each other: as we already said, sets can be elements of other sets. But they can also be subsets, which is something we actually use in the main text quite a lot. The two are not the same! While an element x is an object that belongs to set X (and helps define it, in fact), a subset A is a different set (a different bag of elements) than X that stands in a particular relation (called inclusion) with X. It just happens to be the case that all elements of A are also elements of X. So you can picture a subset not as an entity (an object) but as another bag inside the original bag. We write
which reads as “A is a subset of X.” Note that this includes the possibility that A = X (both bags contain the same elements). When this is not the case, i.e., when there are elements that are in X but not in A, we call A a proper subset of X. Conversely, we can also say that X is a (proper) superset of A.
We can also combine sets to make bigger ones. Let’s say you have a bag of apples, and a bag of oranges. If you pour one into the other, you get a bigger bag of apples and oranges combined. This is called taking the union of two sets, and we write
We can also combine sets to make bigger ones. Let’s say you have a bag of apples, and a bag of oranges. If you pour one into the other, you get a bigger bag of apples and oranges combined. This is called taking the union of two sets, and we write
which expresses the fact that X contains all the elements of A and all the elements of B. What is neat is that elements of X now have the property of “belonging to A or belonging to B” (in the sense of the inclusive or). We’ll explore this close connection of set theory and mathematical logic further in a bit. For now, let’s just mention that there is another operation, called the intersection, that picks out all elements that “belong to A and also belong to B”
By now, we have a pretty good toolkit for playing around with these bags we call sets. But one major thing is still missing: what if we want to bring some kind of order into our bag? As we’ve mentioned above, the order of elements in a set does not matter. But what if we do want to order our elements? Take, for example, the minimal model of a formal system that we introduce in chapter 11. It takes an input x and the state s of a system and maps it to its output y. This mapping needs to know what is input and what is state. In other words: what comes first and what comes second truly matters. In this case, we do not want an unordered pair {s,x} to go into the map, but an ordered one, which we denote by 〈x,s〉. Here x is always first, and s comes second. This pair forms an ordered sequence of elements, which is called a tuple. Ordered pairs are 2-tuples, triples are 3-tuples, quadruples are 4-tuples, and so on.
Tuples can easily be generated by multiplying one set with another. If we write X × Y, we end up with a set {〈x,y〉}, whose individual elements are ordered pairs of first x then y — in this precise order: Y × X is not the same as X × Y. In fact, the set X × Y contains all the different combinations of elements of X and elements of Y. The resulting product is a bog-standard set, with the only slightly weird property that all its elements consist of ordered pairs of elements, collected from two different sets. It is called the Cartesian product of X and Y, because the classical example for such a set is the geometric plane, first introduced by Descartes, with coordinates x and y along its horizontal and vertical axes, respectively. Every point in the plane is an element of this product set, and we can identify each of them by its unique pair of coordinates 〈x,y〉. This should look familiar to most of you. And it also illustrates, once more, why Y × X is different from X × Y: point 〈x,y〉 is generally not in the same position as point 〈y,x〉.
Tuples can easily be generated by multiplying one set with another. If we write X × Y, we end up with a set {〈x,y〉}, whose individual elements are ordered pairs of first x then y — in this precise order: Y × X is not the same as X × Y. In fact, the set X × Y contains all the different combinations of elements of X and elements of Y. The resulting product is a bog-standard set, with the only slightly weird property that all its elements consist of ordered pairs of elements, collected from two different sets. It is called the Cartesian product of X and Y, because the classical example for such a set is the geometric plane, first introduced by Descartes, with coordinates x and y along its horizontal and vertical axes, respectively. Every point in the plane is an element of this product set, and we can identify each of them by its unique pair of coordinates 〈x,y〉. This should look familiar to most of you. And it also illustrates, once more, why Y × X is different from X × Y: point 〈x,y〉 is generally not in the same position as point 〈y,x〉.
Foundations
This is really all the dry theory you need in order to understand the arguments and the notation we use in the main text. If this is all you wanted from this appendix, then you can stop reading here. But if you are interested in why set theory is useful and fundamental, then do read on! In this section, we’ll discuss how set theory really does provide the foundation for pretty much all other branches of mathematics. In the next two, we then go on to briefly explore what set theory has to say about the logic of our thinking, and about its limitations.
So, how is set theory fundamental to mathematics? Well, although you may think that set theory is maths without numbers, it turns out that numbers can be built out of sets! Let’s see how this can be done.
Remember the empty set {}, an immaterial bag with no elements in it? It stands for nothing. Literally, because neither the bag nor the elements such a bag usually contains are really there. In this way, it is exactly equivalent to the number zero: when we write “0,” this symbol also signifies absence. Just like in the case of the empty set, there is simply nothing there. And while it is possible to be present in many different ways, all absences are the same, at least from a mathematical point of view. From this, we can deduce that “{}” and “0” are different symbols that refer to the same, which is simply nothing.
Now, to make your head hurt properly, let us point out that the symbols “{}” and “0” themselves are not nothing. And they also aren’t the same, but since they both signify nothing, we can exchange them for one another, whenever we like, because they mean the same. Based on this, we can simply declare that the concept of the empty set defines the number zero. A mathematician would write
This is really all the dry theory you need in order to understand the arguments and the notation we use in the main text. If this is all you wanted from this appendix, then you can stop reading here. But if you are interested in why set theory is useful and fundamental, then do read on! In this section, we’ll discuss how set theory really does provide the foundation for pretty much all other branches of mathematics. In the next two, we then go on to briefly explore what set theory has to say about the logic of our thinking, and about its limitations.
So, how is set theory fundamental to mathematics? Well, although you may think that set theory is maths without numbers, it turns out that numbers can be built out of sets! Let’s see how this can be done.
Remember the empty set {}, an immaterial bag with no elements in it? It stands for nothing. Literally, because neither the bag nor the elements such a bag usually contains are really there. In this way, it is exactly equivalent to the number zero: when we write “0,” this symbol also signifies absence. Just like in the case of the empty set, there is simply nothing there. And while it is possible to be present in many different ways, all absences are the same, at least from a mathematical point of view. From this, we can deduce that “{}” and “0” are different symbols that refer to the same, which is simply nothing.
Now, to make your head hurt properly, let us point out that the symbols “{}” and “0” themselves are not nothing. And they also aren’t the same, but since they both signify nothing, we can exchange them for one another, whenever we like, because they mean the same. Based on this, we can simply declare that the concept of the empty set defines the number zero. A mathematician would write
where the triangle equality ≜ means that both sides of the equation are equal by definition. But, because the symbol “0” is not nothing (it is a symbol, after all), we can put it as an element in another set, which gives us a bag that is no longer empty, but contains exactly one member: the (symbol for the) number zero. Equivalently, we can put the symbol “{}” for the empty set in a set (remember: sets can be elements of other sets), which gives us the somewhat baffling, but entirely consistent {{}}, which is not the empty set, because it has a set (the empty one) as its one and only member. Since “0” and “{}” can be exchanged for one another, these sets are equivalent (containing a single, equivalent member), and we can now declare that, therefore, they both signify the same thing, that is, the number one:
If this doesn’t boggle your mind, then you don’t properly understand it: we have just created something from nothing! In fact, we have created a universal building block for all the other natural numbers.
The number one is a very special number indeed: it is the smallest bag with something in it, and we can use it to define a successor function, which determines whatever number follows immediately after the last one. This works by recursively adding the latest number in our sequence to itself. If we start with 1 ≜ {{}} = {0}, we get 2 ≜ {{},{0}} = {0,1}, a set with two members. If we add the set that defines 2 = {{},{0}} to itself, we get 3 ≜ {{},{0},{{},{0}}} = {0,1,2}, a set with three members. And so on and so forth. The notation in the middle tells us that all we ever need to generate any number are the set-theoretic representations of 0 = {} and 1 = {0}. It gets ugly pretty quickly with all those nested curly braces, but from the simplified formulas on the right, you immediately get the general pattern. And it’s really simple: each natural number can be defined as a set that contains all the natural numbers (including 0) that are smaller than itself. If we keep doing this forever, we get the infinite set of natural numbers
The number one is a very special number indeed: it is the smallest bag with something in it, and we can use it to define a successor function, which determines whatever number follows immediately after the last one. This works by recursively adding the latest number in our sequence to itself. If we start with 1 ≜ {{}} = {0}, we get 2 ≜ {{},{0}} = {0,1}, a set with two members. If we add the set that defines 2 = {{},{0}} to itself, we get 3 ≜ {{},{0},{{},{0}}} = {0,1,2}, a set with three members. And so on and so forth. The notation in the middle tells us that all we ever need to generate any number are the set-theoretic representations of 0 = {} and 1 = {0}. It gets ugly pretty quickly with all those nested curly braces, but from the simplified formulas on the right, you immediately get the general pattern. And it’s really simple: each natural number can be defined as a set that contains all the natural numbers (including 0) that are smaller than itself. If we keep doing this forever, we get the infinite set of natural numbers
where the ellipsis indicates that the sequence goes on forever. It gets a bit more involved when we generate negative integers, rational, real, and complex numbers, but they can all be constructed as well using nothing more than the set-theoretic tools introduced above. In other words, set theory is like a Lego™ set for building numbers! And you thought numbers were fundamental to mathematics…
Another neat feature of constructing numbers by recursion is that it automatically gives us a natural ordering, without having to multiply any sets. In fact, as we have just seen, any natural number contains all smaller numbers as members of its own set representation! This makes numbers that get added later greater than those added earlier, which puts all natural numbers in order with regard to each other, and we end up with the familiar visualisation of the number line
Another neat feature of constructing numbers by recursion is that it automatically gives us a natural ordering, without having to multiply any sets. In fact, as we have just seen, any natural number contains all smaller numbers as members of its own set representation! This makes numbers that get added later greater than those added earlier, which puts all natural numbers in order with regard to each other, and we end up with the familiar visualisation of the number line
which represents a linearly or totally ordered set, because all of its members stand in relation to each other through the ordering relation ≤ (“is smaller than or equal to”). Equivalently, we can draw the line extending to the left, and order the numbers by ≥ (“is larger than or equal to”). This means we can always tell whether a natural number is smaller, larger or equal to any other natural number. From all this, it should be evident that the natural numbers are a simple set, with additional mathematical structure: the ordering relation that arranges its elements neatly (and without exception) along an infinite line.
It is fascinating that you can build numbers from sets. But the real reason that set theory is fundamental to mathematics lies in the fact that we can describe almost any mathematical object or structure as a set with some additional structure. In the case of the natural numbers, we wrote 〈ℕ,≤〉 to indicate that we added an ordering relation (≤) to an ordinary underlying set (ℕ) that is initially unordered on its own.
Ordered sets are a particularly simple example of objects with added mathematical structure. But there is so much more! Let’s complicate things a bit by defining a mathematical operation on the natural numbers. We’ll take addition as an example, because it is the simplest choice we can think of. We can now ask what happens when we add two numbers, say 2 and 5. Obviously, this sum evaluates to 7, and we get a mapping (also constructed from sets, see chapter 11) from the number line ℕ to itself:
It is fascinating that you can build numbers from sets. But the real reason that set theory is fundamental to mathematics lies in the fact that we can describe almost any mathematical object or structure as a set with some additional structure. In the case of the natural numbers, we wrote 〈ℕ,≤〉 to indicate that we added an ordering relation (≤) to an ordinary underlying set (ℕ) that is initially unordered on its own.
Ordered sets are a particularly simple example of objects with added mathematical structure. But there is so much more! Let’s complicate things a bit by defining a mathematical operation on the natural numbers. We’ll take addition as an example, because it is the simplest choice we can think of. We can now ask what happens when we add two numbers, say 2 and 5. Obviously, this sum evaluates to 7, and we get a mapping (also constructed from sets, see chapter 11) from the number line ℕ to itself:
There are a few things to note here. Firstly (and most obviously), whatever natural numbers we add together, the result is always another natural number. Mathematicians call this closure, but that’s a bit confusing since it has little to do with the kind of closure we discussed in chapter 10. It just means that the set of natural numbers is closed with regard to the operation of addition: you never get to leave the domain of the natural numbers by just adding natural numbers.
In addition, there is an identity element: if we add 0 to any number, we just get that same number or, in maths speak: a + 0 = 0 + a = a. Finally, the order in which we add numbers does not matter. This manifests in two distinct ways. First, addition is commutative, i.e., 5 + 2 = 2 + 5 = 7:
In addition, there is an identity element: if we add 0 to any number, we just get that same number or, in maths speak: a + 0 = 0 + a = a. Finally, the order in which we add numbers does not matter. This manifests in two distinct ways. First, addition is commutative, i.e., 5 + 2 = 2 + 5 = 7:
And second, addition is associative. When adding three numbers together, it does not matter which ones we add first, as indicated by the parentheses in this formula: (2 + 5) + 3 = 2 + (5 + 3) = 10:
This is important when we want to combine multiple addition operations, which is something that mathematicians call composition. Associativity implies that we can compose additive operations freely, in whatever order we want, and we still arrive at the exact same result.
Maybe you’re already noticed: the natural numbers, with the operation of addition added, are just another set with structure. This type of enhanced set is called a (commutative) monoid (for reasons we will explain later) and we write ⟨ℕ,+,0⟩, to indicate the operation, and its associated identity element, that we have added to ℕ, or simply ⟨ℕ,+⟩ if the nature of the identity is clear from the context.
Now we can get even more ambitious and add subtraction into the mix. However, this causes a number of headaches. First of all, subtraction is not commutative: 5 - 2 ≠ 2 - 5. And second, the result of 2 - 5 is not a natural number, but a negative integer (-3). The natural numbers are obviously not closed with regard to subtraction. Two steps are required to rescue the situation. The first is to realise that subtraction is nothing but the inverse of addition: add two, subtract two and you end up where you started. The second is to invent negative numbers which some unknown Chinese mathematician or bookkeeper did for us more than 2,000 years ago. Now we can write 5 - 2 = 5 + (-2), and suddenly we understand why subtraction doesn’t commute: 5 + (-2) ≠ (-5) + 2. We are not adding the same numbers together: instead 2 and -2, as well as 5 and -5 are inverses of each other, and subtraction is just addition with such inverses.
In summary, the set of (positive and negative) integers 𝕫 (for german “Zahl” = “number”) with the now invertible operation of addition/subtraction, can be written ⟨𝕫,+⟩. Mathematicians call such invertible monoids groups. And they love groups because their behaviour is so well understood, and you can apply them in all kinds of useful ways: to show that doughnuts and tea cups are really the same, for example, or to optimise your mattress-flipping routine, or to understand the dynamics of formal systems (which is somewhat more relevant to our argument, and something we describe in more detail in chapter 12).
But why stop at addition? It is extremely straightforward to define a commutative monoid ⟨𝕫,•⟩ for integers and multiplication which, like addition, is closed with regard to 𝕫, associative, commutative, and has an identity element (1). And if we include both addition and multiplication into our enhanced set, we get ⟨𝕫,+,•⟩, which is called a ring, an additive (commutative) group with a multiplicative monoid on top.
Things get even more complicated if we allow multiplication to be inverted: division of two integers lands us in the set of rational numbers ℚ, named like this not because they are good at reasoning, but because they represent ratios of integers like ½, ¾, and so on. And with division, something else that is interesting happens: we have to declare division by the number zero illegal. This is because multiplying any number by 0 is 0, so that reversing this operation could result in literally anything at all. That way madness lies. Division by zero generates something that mathematicians hate: inconsistency. It’s there, and it comes straight out of our formalism, but we cannot deal with it. So we simply forbid it: thou shalt not divide by zero, because we can’t make sense of its outcome. We’ll see this kind of legislative approach to mathematics again, further below, when we talk about the paradoxes of set theory. It’s a good indication that not even mathematics itself is objectively given, but constructed by humans as a useful cognitive tool.
After a few more expansions and complications to be overcome, we arrive at the set of real numbers ℝ with everything included: addition, subtraction, multiplication, and division. This is called a field, a set with all the operational dressings on top. Or, not quite all of them: we can go on to powers and roots, which is where the complex numbers ℂ come from (another example of a field), and even further than that (but that’s a topic for another day).
Even if you don’t quite understand all the technical details, the main upshot should be clear: sets are amazing tools for constructing abstract mathematical structures! And we haven’t even touched on more complicated mathematical objects yet: lattices (a special type of ordering), vector spaces (within which all of linear algebra happens), topological spaces (kind of like geometry, but continuous, without any sharp edges), and all kinds of algebras (including Boolean ones, which are fundamental for discrete mathematics, and therefore also how computers work). All of these complex objects can be constructed from sets simply by adding structure.
And even better: this set-theoretic way of constructing mathematical objects reveals how they relate to each other. We’ve already mentioned that a group is just an invertible monoid, and that rings and fields are made from combinations of monoids and/or groups. In addition, vector spaces are defined over fields, a Boolean algebra can be interpreted as a complemented, distributive lattice, and so on and so forth. The area of mathematics that deals with such objects and their relations is called abstract algebra. The web it weaves between algebraic objects is the glue that holds the different branches of mathematics together. What a beautiful tangle of abstractions! All profoundly connected. This is the true power of set theory. Would you have imagined, when you were struggling with its boring formalism in school?
Maybe you’re already noticed: the natural numbers, with the operation of addition added, are just another set with structure. This type of enhanced set is called a (commutative) monoid (for reasons we will explain later) and we write ⟨ℕ,+,0⟩, to indicate the operation, and its associated identity element, that we have added to ℕ, or simply ⟨ℕ,+⟩ if the nature of the identity is clear from the context.
Now we can get even more ambitious and add subtraction into the mix. However, this causes a number of headaches. First of all, subtraction is not commutative: 5 - 2 ≠ 2 - 5. And second, the result of 2 - 5 is not a natural number, but a negative integer (-3). The natural numbers are obviously not closed with regard to subtraction. Two steps are required to rescue the situation. The first is to realise that subtraction is nothing but the inverse of addition: add two, subtract two and you end up where you started. The second is to invent negative numbers which some unknown Chinese mathematician or bookkeeper did for us more than 2,000 years ago. Now we can write 5 - 2 = 5 + (-2), and suddenly we understand why subtraction doesn’t commute: 5 + (-2) ≠ (-5) + 2. We are not adding the same numbers together: instead 2 and -2, as well as 5 and -5 are inverses of each other, and subtraction is just addition with such inverses.
In summary, the set of (positive and negative) integers 𝕫 (for german “Zahl” = “number”) with the now invertible operation of addition/subtraction, can be written ⟨𝕫,+⟩. Mathematicians call such invertible monoids groups. And they love groups because their behaviour is so well understood, and you can apply them in all kinds of useful ways: to show that doughnuts and tea cups are really the same, for example, or to optimise your mattress-flipping routine, or to understand the dynamics of formal systems (which is somewhat more relevant to our argument, and something we describe in more detail in chapter 12).
But why stop at addition? It is extremely straightforward to define a commutative monoid ⟨𝕫,•⟩ for integers and multiplication which, like addition, is closed with regard to 𝕫, associative, commutative, and has an identity element (1). And if we include both addition and multiplication into our enhanced set, we get ⟨𝕫,+,•⟩, which is called a ring, an additive (commutative) group with a multiplicative monoid on top.
Things get even more complicated if we allow multiplication to be inverted: division of two integers lands us in the set of rational numbers ℚ, named like this not because they are good at reasoning, but because they represent ratios of integers like ½, ¾, and so on. And with division, something else that is interesting happens: we have to declare division by the number zero illegal. This is because multiplying any number by 0 is 0, so that reversing this operation could result in literally anything at all. That way madness lies. Division by zero generates something that mathematicians hate: inconsistency. It’s there, and it comes straight out of our formalism, but we cannot deal with it. So we simply forbid it: thou shalt not divide by zero, because we can’t make sense of its outcome. We’ll see this kind of legislative approach to mathematics again, further below, when we talk about the paradoxes of set theory. It’s a good indication that not even mathematics itself is objectively given, but constructed by humans as a useful cognitive tool.
After a few more expansions and complications to be overcome, we arrive at the set of real numbers ℝ with everything included: addition, subtraction, multiplication, and division. This is called a field, a set with all the operational dressings on top. Or, not quite all of them: we can go on to powers and roots, which is where the complex numbers ℂ come from (another example of a field), and even further than that (but that’s a topic for another day).
Even if you don’t quite understand all the technical details, the main upshot should be clear: sets are amazing tools for constructing abstract mathematical structures! And we haven’t even touched on more complicated mathematical objects yet: lattices (a special type of ordering), vector spaces (within which all of linear algebra happens), topological spaces (kind of like geometry, but continuous, without any sharp edges), and all kinds of algebras (including Boolean ones, which are fundamental for discrete mathematics, and therefore also how computers work). All of these complex objects can be constructed from sets simply by adding structure.
And even better: this set-theoretic way of constructing mathematical objects reveals how they relate to each other. We’ve already mentioned that a group is just an invertible monoid, and that rings and fields are made from combinations of monoids and/or groups. In addition, vector spaces are defined over fields, a Boolean algebra can be interpreted as a complemented, distributive lattice, and so on and so forth. The area of mathematics that deals with such objects and their relations is called abstract algebra. The web it weaves between algebraic objects is the glue that holds the different branches of mathematics together. What a beautiful tangle of abstractions! All profoundly connected. This is the true power of set theory. Would you have imagined, when you were struggling with its boring formalism in school?
Paradoxes
Even if you do not understand every detail of what we’ve outlined so far, we hope you can appreciate the usefulness and the fundamental importance of set theory for the working mathematician. But its reach goes much further: it underlies the basic logic of our rationality, of our formal thinking, and of computation. And it reveals boundaries at which this logic breaks down.
We discuss in chapter 11 how set theory formalizes the doctrine of containment, our deeply ingrained view that the world consists of containers full of smaller containers — until, perhaps, we reach some fundamental level where the buck stops. This rationale underlies most of Western philosophy and science. In addition, we briefly mentioned above that set theory can be used to characterise the basic operations behind formal logic, and the Boolean algebra that provides the foundation of all our digital computing devices. These operations are defined from set unions (OR), intersections (AND), as well as the idea of a complementary set Aᶜ (see chapter 11), which contains all the elements that are NOT in A. All other logical manipulations of symbols (such as logical equivalences and material conditionals) can then be derived from straightforward combinations of these three basic operations.
To be precise, this kind of basic Boolean algebra is equivalent to zero-order or propositional logic, which is not quite sufficient yet to capture all the inferences that we use when reasoning logically. In order to get the full power of first-order or predicate logic, we need an additional couple of conceptual tools called logical quantifiers. They allow us to delimit (or “quantify”) the range of set elements to which a logical predicate (or property) applies. Take the classical syllogism “All humans are mortal. Hipparchia is a human. Therefore, Hipparchia is mortal.” This only works if all humans are mortal. However, for many logical arguments it’s enough for some elements to have a certain property. “All reptiles are scaly. Some pets are reptiles. Thus, some pets are scaly.” This is the difference between arguing for the necessity or mere existence of something. The former has much more stringent requirements than the latter.
In principle, we should be allowed to use any first-order logical proposition in set comprehension, i.e., the intensional definition of a set (see above). This is, in fact, what early set theorists thought. And, we should be able to combine such propositions at will to build any kind of desired set. For instance, the set of “scaly pets” contains all those members of the set “pets” that have the property of being “scaly.” It is a proper subset of “pets,” while the set of “pet snakes” is a further subset of “scaly pets,” and so on. To such sets we can then add additional mathematical structures, such as orderings and operations.
But: danger lurks in two particular corners of the universe of sets! Remember when we said that infinite sets were no problem? Well, we oversimplified a bit. It’s true: most infinite sets are standard fare, but some notions of infinity cause trouble. Georg Cantor, the inventor of set theory, was the first to realise this, when he was studying a specific kind of number called a cardinal. These numbers have nothing to do with birds, or the catholic church. Instead, they are numbers that denote the size of a set, i.e., how many distinct elements it contains. We call this the cardinality of a set. Sets that contain the same number of elements are equipotent or equinumerous to each other. Equipotence defines what it means for two sets to be isomorphic (cf. chapter 11): it allows us to establish invertible one-to-one mappings between them.
For finite-sized sets, cardinals are just the same as the natural numbers. All sets with 42 elements (whatever those elements are, even if they are ordered tuples) have a cardinality of 42 and are therefore isomorphic to each other. Easy. But things get more complicated when we consider infinite sets such as those of the natural numbers ℕ, and real numbers ℝ. What cardinality does ℕ have? Well, it’s infinite. Is it really okay to talk about infinity as a number? Mathematicians say yes! And for good reasons, even though infinite-sized numbers have some really strange properties. Let us call this number ω (“omega,” the last letter of the Greek alphabet). It is no longer a natural number, because it is larger than any of these. And it doesn’t really fit into any of the other types of numbers we’ve encountered so far either.
This is where things get a bit weird, as they tend to when we talk about infinity. We have just entered the realm of transfinite ordinal numbers. It seems like a magic trick, but mathematicians claim that they can count beyond infinity! Even infinite numbers have successors like natural numbers do: ω + 1 follows ω just like 2 follows 1. And transfinite ordinal numbers are defined in terms of sets in a way that is exactly analogous to how we defined the natural numbers above. So far so good. The really mind-twisting thing is that we can count beyond infinity (using ordinal numbers), but adding transfinite numbers to each other does not increase the size (or cardinality) of the result: ω + ω = ω. This actually does make sense. You have to trust us. We’re not actually going into the details here, but if you are interested in knowing why, then look up Hilbert’s hotel.
What we do need to understand is that this is the reason why we distinguish between ordinal and cardinal numbers: they are pretty much the same in the domain of the finite, but counting and size differ in the transfinite domain. So let’s give the cardinal number that stands for the size of ℕ a different name: mathematicians use the Hebrew alphabet for this, and call it א₀ (pronounced as “aleph-nought”). Basically, any infinite set with a countable number of elements is of this particular size.
And just when you thought it could not get any weirder, of course, it does. Cantor, in the 1870s, arrived at a very intriguing result: the cardinality of the set of real numbers ℝ must be larger than א₀ (the cardinality of ℕ). But how is that even possible? Remember: adding ω to ω equals ω. So how on earth could anything be greater than א₀? Again, we can’t go too much into detail here, but the reasoning goes more or less like this: between each natural number and its immediate successor (say, 0 and 1), there are an infinite number of reals. In fact, you can always squeeze another irrational number between two real numbers. There is always more space between them. This has the strange consequence that (unlike natural numbers) reals don’t have any immediate predecessors or successors, which means you cannot count them (even if you had an infinite amount of time). Just let this sink in for a minute or two.
The name of this larger, uncountable infinity is א₁ (“aleph-one”). It is larger than any ordinal number, since all of these are defined by countably infinite sets. Counting is the whole point of ordinal numbers, after all. So, we’re talking about a really enormous infinity here! As if א₀ wasn’t incomprehensible enough… Exactly how large א1 is remains open to debate. Cantor made a guess with his continuum hypothesis, and it’s a reasonable guess, but it turns out that this hypothesis cannot be proven from within set theory. Be that as it may, we can now extend the cardinals from א₀ and א₁ to א₂, א₃ … אω (and beyond, to transfinite indices). There is no intuitive distinction (like countable vs. uncountable) for these larger infinities anymore, but mathematicians can still define them abstractly, and construct them intensionally, using set comprehension. Now we can ask: do these cardinals (including the infinite ones) themselves form a set?
And this brings us back to the actual paradox that Cantor discovered: the totality of all cardinal numbers (including all the strange uncountable ones) cannot itself be a set, since there is no cardinality that this set could possibly have. Its size would, after all, be a cardinal number itself, and so it would have to be an element of itself. Even worse: Cesare Burali-Forti showed (in 1897) that the same kind of paradox arises within the ordinals: the totality of all ordinal numbers cannot be a set, since the ordinal number associated with its definition (its order-type) would be a member of itself. This kind of self-referentiality spells trouble. And yet, set theorists decided to simply ignore these infinity paradoxes, calling these totalities proper classes, to distinguish them from sets. Don’t confuse these with the classes that are sets (containing elements sharing a common property), which we introduced above.
In other words, “proper class” is just a name for an entity whose properties we do not really understand. This is not a solution to the infinity paradoxes. It only explains them away. Yet, obviously, ignoring this kind of fundamental problem risks that it will come back to bite you. And bite it did, leading to the complete implosion of the naïve set theory prevalent at the turn of the 20th century and, with it, the entire edifice of mathematics that had been built on top of it over the previous decades. This breakdown was triggered by Bertrand Russel and his paradox (discovered around 1901), which is related to the infinity paradoxes above, but brings the problem of self-referentiality center stage. It is of fundamental importance for any formal approach to modelling the organisation of living systems (and their ways to come to know the world), which are heavily self-referential in exactly this (problematic) way.
Russell’s paradox is usually introduced (informally) as the tale of the barber, who shaves those men in town who do not shave themselves. The question is: does the barber shave himself? Well, if he does, then he cannot shave himself, because he only shaves those who do not shave themselves. But if he does not, then he does, since he shaves all those who do not shave themselves. Evidently, none of this makes any sense. There is a glaring contradiction. The barber is an impossible figure.
Formally, this corresponds to the following set comprehension: let us (intensionally) define a set R that contains all sets which do not contain themselves
Even if you do not understand every detail of what we’ve outlined so far, we hope you can appreciate the usefulness and the fundamental importance of set theory for the working mathematician. But its reach goes much further: it underlies the basic logic of our rationality, of our formal thinking, and of computation. And it reveals boundaries at which this logic breaks down.
We discuss in chapter 11 how set theory formalizes the doctrine of containment, our deeply ingrained view that the world consists of containers full of smaller containers — until, perhaps, we reach some fundamental level where the buck stops. This rationale underlies most of Western philosophy and science. In addition, we briefly mentioned above that set theory can be used to characterise the basic operations behind formal logic, and the Boolean algebra that provides the foundation of all our digital computing devices. These operations are defined from set unions (OR), intersections (AND), as well as the idea of a complementary set Aᶜ (see chapter 11), which contains all the elements that are NOT in A. All other logical manipulations of symbols (such as logical equivalences and material conditionals) can then be derived from straightforward combinations of these three basic operations.
To be precise, this kind of basic Boolean algebra is equivalent to zero-order or propositional logic, which is not quite sufficient yet to capture all the inferences that we use when reasoning logically. In order to get the full power of first-order or predicate logic, we need an additional couple of conceptual tools called logical quantifiers. They allow us to delimit (or “quantify”) the range of set elements to which a logical predicate (or property) applies. Take the classical syllogism “All humans are mortal. Hipparchia is a human. Therefore, Hipparchia is mortal.” This only works if all humans are mortal. However, for many logical arguments it’s enough for some elements to have a certain property. “All reptiles are scaly. Some pets are reptiles. Thus, some pets are scaly.” This is the difference between arguing for the necessity or mere existence of something. The former has much more stringent requirements than the latter.
In principle, we should be allowed to use any first-order logical proposition in set comprehension, i.e., the intensional definition of a set (see above). This is, in fact, what early set theorists thought. And, we should be able to combine such propositions at will to build any kind of desired set. For instance, the set of “scaly pets” contains all those members of the set “pets” that have the property of being “scaly.” It is a proper subset of “pets,” while the set of “pet snakes” is a further subset of “scaly pets,” and so on. To such sets we can then add additional mathematical structures, such as orderings and operations.
But: danger lurks in two particular corners of the universe of sets! Remember when we said that infinite sets were no problem? Well, we oversimplified a bit. It’s true: most infinite sets are standard fare, but some notions of infinity cause trouble. Georg Cantor, the inventor of set theory, was the first to realise this, when he was studying a specific kind of number called a cardinal. These numbers have nothing to do with birds, or the catholic church. Instead, they are numbers that denote the size of a set, i.e., how many distinct elements it contains. We call this the cardinality of a set. Sets that contain the same number of elements are equipotent or equinumerous to each other. Equipotence defines what it means for two sets to be isomorphic (cf. chapter 11): it allows us to establish invertible one-to-one mappings between them.
For finite-sized sets, cardinals are just the same as the natural numbers. All sets with 42 elements (whatever those elements are, even if they are ordered tuples) have a cardinality of 42 and are therefore isomorphic to each other. Easy. But things get more complicated when we consider infinite sets such as those of the natural numbers ℕ, and real numbers ℝ. What cardinality does ℕ have? Well, it’s infinite. Is it really okay to talk about infinity as a number? Mathematicians say yes! And for good reasons, even though infinite-sized numbers have some really strange properties. Let us call this number ω (“omega,” the last letter of the Greek alphabet). It is no longer a natural number, because it is larger than any of these. And it doesn’t really fit into any of the other types of numbers we’ve encountered so far either.
This is where things get a bit weird, as they tend to when we talk about infinity. We have just entered the realm of transfinite ordinal numbers. It seems like a magic trick, but mathematicians claim that they can count beyond infinity! Even infinite numbers have successors like natural numbers do: ω + 1 follows ω just like 2 follows 1. And transfinite ordinal numbers are defined in terms of sets in a way that is exactly analogous to how we defined the natural numbers above. So far so good. The really mind-twisting thing is that we can count beyond infinity (using ordinal numbers), but adding transfinite numbers to each other does not increase the size (or cardinality) of the result: ω + ω = ω. This actually does make sense. You have to trust us. We’re not actually going into the details here, but if you are interested in knowing why, then look up Hilbert’s hotel.
What we do need to understand is that this is the reason why we distinguish between ordinal and cardinal numbers: they are pretty much the same in the domain of the finite, but counting and size differ in the transfinite domain. So let’s give the cardinal number that stands for the size of ℕ a different name: mathematicians use the Hebrew alphabet for this, and call it א₀ (pronounced as “aleph-nought”). Basically, any infinite set with a countable number of elements is of this particular size.
And just when you thought it could not get any weirder, of course, it does. Cantor, in the 1870s, arrived at a very intriguing result: the cardinality of the set of real numbers ℝ must be larger than א₀ (the cardinality of ℕ). But how is that even possible? Remember: adding ω to ω equals ω. So how on earth could anything be greater than א₀? Again, we can’t go too much into detail here, but the reasoning goes more or less like this: between each natural number and its immediate successor (say, 0 and 1), there are an infinite number of reals. In fact, you can always squeeze another irrational number between two real numbers. There is always more space between them. This has the strange consequence that (unlike natural numbers) reals don’t have any immediate predecessors or successors, which means you cannot count them (even if you had an infinite amount of time). Just let this sink in for a minute or two.
The name of this larger, uncountable infinity is א₁ (“aleph-one”). It is larger than any ordinal number, since all of these are defined by countably infinite sets. Counting is the whole point of ordinal numbers, after all. So, we’re talking about a really enormous infinity here! As if א₀ wasn’t incomprehensible enough… Exactly how large א1 is remains open to debate. Cantor made a guess with his continuum hypothesis, and it’s a reasonable guess, but it turns out that this hypothesis cannot be proven from within set theory. Be that as it may, we can now extend the cardinals from א₀ and א₁ to א₂, א₃ … אω (and beyond, to transfinite indices). There is no intuitive distinction (like countable vs. uncountable) for these larger infinities anymore, but mathematicians can still define them abstractly, and construct them intensionally, using set comprehension. Now we can ask: do these cardinals (including the infinite ones) themselves form a set?
And this brings us back to the actual paradox that Cantor discovered: the totality of all cardinal numbers (including all the strange uncountable ones) cannot itself be a set, since there is no cardinality that this set could possibly have. Its size would, after all, be a cardinal number itself, and so it would have to be an element of itself. Even worse: Cesare Burali-Forti showed (in 1897) that the same kind of paradox arises within the ordinals: the totality of all ordinal numbers cannot be a set, since the ordinal number associated with its definition (its order-type) would be a member of itself. This kind of self-referentiality spells trouble. And yet, set theorists decided to simply ignore these infinity paradoxes, calling these totalities proper classes, to distinguish them from sets. Don’t confuse these with the classes that are sets (containing elements sharing a common property), which we introduced above.
In other words, “proper class” is just a name for an entity whose properties we do not really understand. This is not a solution to the infinity paradoxes. It only explains them away. Yet, obviously, ignoring this kind of fundamental problem risks that it will come back to bite you. And bite it did, leading to the complete implosion of the naïve set theory prevalent at the turn of the 20th century and, with it, the entire edifice of mathematics that had been built on top of it over the previous decades. This breakdown was triggered by Bertrand Russel and his paradox (discovered around 1901), which is related to the infinity paradoxes above, but brings the problem of self-referentiality center stage. It is of fundamental importance for any formal approach to modelling the organisation of living systems (and their ways to come to know the world), which are heavily self-referential in exactly this (problematic) way.
Russell’s paradox is usually introduced (informally) as the tale of the barber, who shaves those men in town who do not shave themselves. The question is: does the barber shave himself? Well, if he does, then he cannot shave himself, because he only shaves those who do not shave themselves. But if he does not, then he does, since he shaves all those who do not shave themselves. Evidently, none of this makes any sense. There is a glaring contradiction. The barber is an impossible figure.
Formally, this corresponds to the following set comprehension: let us (intensionally) define a set R that contains all sets which do not contain themselves
where we (exceptionally) denote set x the same as element x (since they are both identical). The problem is: does this set contain itself? Well, if it does (R ∈ R), then it cannot contain itself (R ∉ R), which is an obvious contradiction. And the other way round: if it does not contain itself (R ∉ R), then it must (R ∈ R). Again, a blatant contradiction. And, as we’ve learned when we discussed division by zero above, if we accept this kind of logical inconsistency anywhere in our formalism, then we can prove anything at all to be true, which renders it completely useless as a conceptual tool. Essentially, if this kind of set comprehension (based on entirely valid logical propositions) is allowed, then logic breaks down. This came as a true shock to the world of mathematics at the time.
Solutions needed to be found, urgently, to prevent logical mayhem at the very foundation of mathematics. But unlike division by zero, we cannot simply outlaw set comprehension wholesale, since it is central to what makes set theory powerful and applicable to other branches of mathematics. Therefore, more subtle regulative interventions were called for. We’ll talk about those in the following section, where our focus will be on debates about how much self-referentiality we are required to retain in order to describe what is happening in natural and, in particular, living systems.
Solutions needed to be found, urgently, to prevent logical mayhem at the very foundation of mathematics. But unlike division by zero, we cannot simply outlaw set comprehension wholesale, since it is central to what makes set theory powerful and applicable to other branches of mathematics. Therefore, more subtle regulative interventions were called for. We’ll talk about those in the following section, where our focus will be on debates about how much self-referentiality we are required to retain in order to describe what is happening in natural and, in particular, living systems.
Axioms
Thus, here we are, in the first decade of the 20th century, with a shiny new foundation for mathematics which allows us to construct any formal object imaginable. And, yet, there is a glaring logical inconsistency, a gaping hole, at its very heart. This is a mathematical bummer if there ever was one.
It is truly irritating: what does this conceptual mess say (in general) about our ability to construct coherent formal arguments? And, more concretely: how can this botched up situation be fixed without throwing the baby out with the bathwater? To answer this question, we must first home in on what is actually causing the inconsistency. It is not set comprehension per se, with its powerful capability to generate infinite sets, that is the problem. We definitely want to retain that for set theory to be useful.
Instead, the problem lies in the self-referentiality that all the above paradoxes have in common. The cardinality of the class of all cardinal numbers must itself be a member of this class. The order-type of the class of all ordinal numbers must itself be a member of this class. And the comprehension we used to generate the set that contains all sets that do not contain themselves (R = { x | x ∉ x }) features x both on the side of the set being defined, and the side of the equation that provides the definition. This amounts to x appearing in the outcome (or range) of the function that is x, a clear violation of Aristotle’s ban on circular reasoning. We’ll come back to this problem below and in part three of the book.
This kind of definition, which relies on the very set being defined (or, more commonly and indirectly, some set that contains the set being defined) is called impredicative. Some mathematicians are uneasy with such logical circularity. Henri Poincaré (1906), for example, went on a mission to rid mathematics of its pernicious presence altogether. The catch is: impredicativity is extremely common. It affects the definitions of many basic concepts and is, for the most part, completely harmless. Take, for instance, “the largest fish in the pond.” Evidently, this definition includes all the fish in the pond (including the largest one itself). Yet, in this case, the self-reference is not really problematic. In fact, the largest fish represents the general notion of a least upper bound (or supremum), which is something we will use ourselves to classify and relate different models of natural systems in the last two parts of the book.
So, what then is the problem? In essence, self-reference becomes an issue when it generates some kind of infinite regress — basically, when an impredicative definition “folds in” on itself. This does not happen with the largest fish in the pond, since there are only so many fishes to be counted. It requires both infinity and impredicativity. In the case of the infinity paradoxes, this is quite obvious: there is always another larger cardinal or ordinal that is required to characterise the class of all cardinals or ordinals, which leads to a runaway effect. And it is also present in Rusell’s paradox, although more subtly, sending us into an endless logical spin: if the set of all sets that do not contain themselves contains itself, then it does not, and vice versa. Back and forth, forever. This is the problem we need to resolve.
There are several possible ways to avoid this kind of plight. The first is simply to forbid the particular kind of set comprehension represented by the above paradoxes. This was the way chosen by Ernst Zermelo, Abraham Fraenkel, and John von Neumann (among others) who proposed axioms that yield a consistent foundation for set theory (and, thus, the rest of mathematics). This idea is as old as mathematical practice itself. The most famous axiomatic system is probably Euclidean geometry, formulated more than 2,000 years ago. The basic idea is to come up with a minimal set of postulates (the axioms) that we have to take for granted, without further justification, and from which we can derive all kinds of consistent mathematical insights called theorems that are useful for some purpose.
In fact, we’ve already come across many set theory axioms. The most fundamental one is probably the axiom of extension, which tells us that two sets are identical if they contain the same elements. Then there are axioms that postulate the existence of the null set, of set union, of unordered and ordered pairs (Cartesian products), and the axiom schema (a proper class of axioms, strictly speaking) of specification that tell us we can build sets by set comprehension, with an extra axiom attached that allows for infinite-sized sets that result from such intensional definitions of sets. So far, there is nothing new in any of this.
The true invention of Zermelo-Fraenkel (ZF) set theory, as it is called today, consists of two additional axioms, which serve to prevent paradoxical set comprehension. The first is called the axiom of replacement (again, an axiom schema, if we want to be precise). It states that a mathematical function or mapping (see chapter 11), which starts with a set as its domain, will always produce a well-defined set as its range (i.e., outcome). The purpose of this axiom, roughly speaking, is to deal with the weirdness that arises among transfinite numbers. The second is called the axiom of regularity (or foundation). And this is the crucial one: it states that every non-empty set must contain an element that is disjoint from itself. It outlaws constructions such as “the set of all sets” (which only contains elements equivalent to itself), and also “the set of all sets that do not contain themselves” (idem). Basically, it is a stop sign for impredicative infinite regress.
So far so good. But there is still some residual weirdness in the transfinite domain. It turns out that many interesting mathematical insights require two further postulates that remain highly controversial among mathematicians, even today. The first we’ve already encountered: it is Cantor’s continuum hypothesis, which states that any infinite set of real numbers either has a cardinality equal to that of ℕ (א₀), i.e., it is countable, or a larger cardinality equal to the one of ℝ (א₁), which means it is not countable. There is nothing in between these two different infinities. This hypothesis is widely assumed to be true, but we now know that it cannot be proven from within Zermelo-Fraenkel set theory.
Last but not least, there is the problem of picking among an infinite set of socks. Yes, you read that right. Mathematicians worry about infinite sets of matching socks. The underlying problem is that Cartesian products are not obviously defined for infinite families of sets. Imagine, if you can, that you have an infinite set of pairs of socks (left and right) in your drawer. Is it still possible to make definite choices of one left and one right sock, given that there are infinitely many pairs? It sounds bizarre, and it is not easy to understand for non-mathematicians why this should matter, but it turns out to be essential for many problems that involve, for example, the ordering of infinite sets. And these turn out to be surprisingly common. So we need an additional axiom: the axiom of choice, which states that you can indeed pick a pair of socks from your infinite collection. But not all mathematicians agree. So it is customary to state explicitly when your derivations depend on this axiom, and that you are using ZFC set theory (Zermelo-Fraenkel with the additional axiom of choice included).
In conclusion, we have now arrived at a consistent kind of axiomatic set theory. But note: ZF(C) does not resolve the infinity paradoxes or Russell’s paradox at all. Rather, it avoids them in just the same way we avoid inconsistency with division by zero: it simply outlaws (by decree, through the axiom of regularity) the kind of set comprehension that leads to these paradoxes. However, it does not tell us what a proper class is — other than declaring it not to be a set (and hence not a problem for set theory). The paradox remains. We’ve only safeguarded ourselves against it. And that’s fine by most mathematicians (and, obviously, non-mathematicians as well). Unless you are interested in the foundations of mathematical logic, or the specialised field investigating very large cardinals, or (and this is obviously relevant here) the problem of circular causation in living systems, you have little need to worry about these things.
Those mathematicians that did worry came up with various complementary approaches and extensions to ZF(C) set theory. Bertrand Russell himself invented a thing called type theory, which basically attaches different labels to different sets. We’ll revisit this when we talk more about λ calculus, later on. Think of the types of variables used in computer programming, for example: integers, floating point numbers, character strings, and so on. Others, like Kurt Gödel and John von Neumann, interpreted such labels as levels in a hierarchy of sets and classes — a strictly layered mathematical universe. This avoids vicious self-reference by declaring the left- and right-hand side of an impredicative definition to belong to different types or levels. But what exactly belongs to which type or level remains open to debate. Von Neumann-Bernays-Gödel set theory, an extension of ZFC, defines a proper class as a collection that strictly ranges only over sets. But, then, is there a class of all classes? We could go there, but we won’t…
We think the problem of proper classes is intriguing. It hints at the fact that there seems to be no well-defined universe that encompasses all possible mathematical objects. We do have more to say about this in chapter 12. But, beyond this basic insight, we are mainly interested in the kind of infinite regress that is caused by the “infolding” of Russell’s paradoxical set. While the infinity paradoxes are about a regress that expands “outward,” towards larger and larger numbers, Russell’s paradox generates infinity inward, by generating a kind of inferential circularity. Because of this, some mathematicians remain unhappy with the restrictions of axiomatic set theory. They point out that it may be convenient to ban problematic self-referentiality — it may, in fact, be essential if we are to ground mathematics in set theory. But it is not necessarily warranted when modelling natural systems: there may be such systems (especially living ones) that exhibit just the kind of impredicativity we’ve just outlawed.
We’ll go into this problem in a lot more detail in the last two parts of the book. For now, let’s just point out that it is perfectly legitimate to use impredicative sets in areas such as the local modelling of particular systems, where the overall consistency of our mathematical framework is of little practical importance, because we never truly enter the outlandish transfinite regions of conceptual territory in which the impredicativity paradoxes occur. Also, as we’ll point out, some of these paradoxes can actually be resolved quite pragmatically by using a more process-oriented approach to mathematics and modelling in general. Again, more on that later. For now, let’s just say that (despite everything we said above) there are approaches to set theory that ignore all the warning signs to boldly go where no rigorous mathematician is supposed to go.
Because of their logical consistency, axiomatic approaches are bundled under the name of well-founded set theory. Accordingly, there is also non-well-founded set theory (yes, the name is awkward), which deliberately relaxes standards and allows some of the problematic cases of self-referentiality. One example is hyperset theory. It scraps the axiom of foundation (which says that a legitimate non-empty set must have an element disjoint from itself, preventing infinite regress), and replaces it with the provocatively named axiom of anti-foundation, which is much more liberal, stating that any set that can be drawn as a consistent directed graph (basically a bunch of pointed arrows) is a valid set. Correspondingly, the following are all okay
Thus, here we are, in the first decade of the 20th century, with a shiny new foundation for mathematics which allows us to construct any formal object imaginable. And, yet, there is a glaring logical inconsistency, a gaping hole, at its very heart. This is a mathematical bummer if there ever was one.
It is truly irritating: what does this conceptual mess say (in general) about our ability to construct coherent formal arguments? And, more concretely: how can this botched up situation be fixed without throwing the baby out with the bathwater? To answer this question, we must first home in on what is actually causing the inconsistency. It is not set comprehension per se, with its powerful capability to generate infinite sets, that is the problem. We definitely want to retain that for set theory to be useful.
Instead, the problem lies in the self-referentiality that all the above paradoxes have in common. The cardinality of the class of all cardinal numbers must itself be a member of this class. The order-type of the class of all ordinal numbers must itself be a member of this class. And the comprehension we used to generate the set that contains all sets that do not contain themselves (R = { x | x ∉ x }) features x both on the side of the set being defined, and the side of the equation that provides the definition. This amounts to x appearing in the outcome (or range) of the function that is x, a clear violation of Aristotle’s ban on circular reasoning. We’ll come back to this problem below and in part three of the book.
This kind of definition, which relies on the very set being defined (or, more commonly and indirectly, some set that contains the set being defined) is called impredicative. Some mathematicians are uneasy with such logical circularity. Henri Poincaré (1906), for example, went on a mission to rid mathematics of its pernicious presence altogether. The catch is: impredicativity is extremely common. It affects the definitions of many basic concepts and is, for the most part, completely harmless. Take, for instance, “the largest fish in the pond.” Evidently, this definition includes all the fish in the pond (including the largest one itself). Yet, in this case, the self-reference is not really problematic. In fact, the largest fish represents the general notion of a least upper bound (or supremum), which is something we will use ourselves to classify and relate different models of natural systems in the last two parts of the book.
So, what then is the problem? In essence, self-reference becomes an issue when it generates some kind of infinite regress — basically, when an impredicative definition “folds in” on itself. This does not happen with the largest fish in the pond, since there are only so many fishes to be counted. It requires both infinity and impredicativity. In the case of the infinity paradoxes, this is quite obvious: there is always another larger cardinal or ordinal that is required to characterise the class of all cardinals or ordinals, which leads to a runaway effect. And it is also present in Rusell’s paradox, although more subtly, sending us into an endless logical spin: if the set of all sets that do not contain themselves contains itself, then it does not, and vice versa. Back and forth, forever. This is the problem we need to resolve.
There are several possible ways to avoid this kind of plight. The first is simply to forbid the particular kind of set comprehension represented by the above paradoxes. This was the way chosen by Ernst Zermelo, Abraham Fraenkel, and John von Neumann (among others) who proposed axioms that yield a consistent foundation for set theory (and, thus, the rest of mathematics). This idea is as old as mathematical practice itself. The most famous axiomatic system is probably Euclidean geometry, formulated more than 2,000 years ago. The basic idea is to come up with a minimal set of postulates (the axioms) that we have to take for granted, without further justification, and from which we can derive all kinds of consistent mathematical insights called theorems that are useful for some purpose.
In fact, we’ve already come across many set theory axioms. The most fundamental one is probably the axiom of extension, which tells us that two sets are identical if they contain the same elements. Then there are axioms that postulate the existence of the null set, of set union, of unordered and ordered pairs (Cartesian products), and the axiom schema (a proper class of axioms, strictly speaking) of specification that tell us we can build sets by set comprehension, with an extra axiom attached that allows for infinite-sized sets that result from such intensional definitions of sets. So far, there is nothing new in any of this.
The true invention of Zermelo-Fraenkel (ZF) set theory, as it is called today, consists of two additional axioms, which serve to prevent paradoxical set comprehension. The first is called the axiom of replacement (again, an axiom schema, if we want to be precise). It states that a mathematical function or mapping (see chapter 11), which starts with a set as its domain, will always produce a well-defined set as its range (i.e., outcome). The purpose of this axiom, roughly speaking, is to deal with the weirdness that arises among transfinite numbers. The second is called the axiom of regularity (or foundation). And this is the crucial one: it states that every non-empty set must contain an element that is disjoint from itself. It outlaws constructions such as “the set of all sets” (which only contains elements equivalent to itself), and also “the set of all sets that do not contain themselves” (idem). Basically, it is a stop sign for impredicative infinite regress.
So far so good. But there is still some residual weirdness in the transfinite domain. It turns out that many interesting mathematical insights require two further postulates that remain highly controversial among mathematicians, even today. The first we’ve already encountered: it is Cantor’s continuum hypothesis, which states that any infinite set of real numbers either has a cardinality equal to that of ℕ (א₀), i.e., it is countable, or a larger cardinality equal to the one of ℝ (א₁), which means it is not countable. There is nothing in between these two different infinities. This hypothesis is widely assumed to be true, but we now know that it cannot be proven from within Zermelo-Fraenkel set theory.
Last but not least, there is the problem of picking among an infinite set of socks. Yes, you read that right. Mathematicians worry about infinite sets of matching socks. The underlying problem is that Cartesian products are not obviously defined for infinite families of sets. Imagine, if you can, that you have an infinite set of pairs of socks (left and right) in your drawer. Is it still possible to make definite choices of one left and one right sock, given that there are infinitely many pairs? It sounds bizarre, and it is not easy to understand for non-mathematicians why this should matter, but it turns out to be essential for many problems that involve, for example, the ordering of infinite sets. And these turn out to be surprisingly common. So we need an additional axiom: the axiom of choice, which states that you can indeed pick a pair of socks from your infinite collection. But not all mathematicians agree. So it is customary to state explicitly when your derivations depend on this axiom, and that you are using ZFC set theory (Zermelo-Fraenkel with the additional axiom of choice included).
In conclusion, we have now arrived at a consistent kind of axiomatic set theory. But note: ZF(C) does not resolve the infinity paradoxes or Russell’s paradox at all. Rather, it avoids them in just the same way we avoid inconsistency with division by zero: it simply outlaws (by decree, through the axiom of regularity) the kind of set comprehension that leads to these paradoxes. However, it does not tell us what a proper class is — other than declaring it not to be a set (and hence not a problem for set theory). The paradox remains. We’ve only safeguarded ourselves against it. And that’s fine by most mathematicians (and, obviously, non-mathematicians as well). Unless you are interested in the foundations of mathematical logic, or the specialised field investigating very large cardinals, or (and this is obviously relevant here) the problem of circular causation in living systems, you have little need to worry about these things.
Those mathematicians that did worry came up with various complementary approaches and extensions to ZF(C) set theory. Bertrand Russell himself invented a thing called type theory, which basically attaches different labels to different sets. We’ll revisit this when we talk more about λ calculus, later on. Think of the types of variables used in computer programming, for example: integers, floating point numbers, character strings, and so on. Others, like Kurt Gödel and John von Neumann, interpreted such labels as levels in a hierarchy of sets and classes — a strictly layered mathematical universe. This avoids vicious self-reference by declaring the left- and right-hand side of an impredicative definition to belong to different types or levels. But what exactly belongs to which type or level remains open to debate. Von Neumann-Bernays-Gödel set theory, an extension of ZFC, defines a proper class as a collection that strictly ranges only over sets. But, then, is there a class of all classes? We could go there, but we won’t…
We think the problem of proper classes is intriguing. It hints at the fact that there seems to be no well-defined universe that encompasses all possible mathematical objects. We do have more to say about this in chapter 12. But, beyond this basic insight, we are mainly interested in the kind of infinite regress that is caused by the “infolding” of Russell’s paradoxical set. While the infinity paradoxes are about a regress that expands “outward,” towards larger and larger numbers, Russell’s paradox generates infinity inward, by generating a kind of inferential circularity. Because of this, some mathematicians remain unhappy with the restrictions of axiomatic set theory. They point out that it may be convenient to ban problematic self-referentiality — it may, in fact, be essential if we are to ground mathematics in set theory. But it is not necessarily warranted when modelling natural systems: there may be such systems (especially living ones) that exhibit just the kind of impredicativity we’ve just outlawed.
We’ll go into this problem in a lot more detail in the last two parts of the book. For now, let’s just point out that it is perfectly legitimate to use impredicative sets in areas such as the local modelling of particular systems, where the overall consistency of our mathematical framework is of little practical importance, because we never truly enter the outlandish transfinite regions of conceptual territory in which the impredicativity paradoxes occur. Also, as we’ll point out, some of these paradoxes can actually be resolved quite pragmatically by using a more process-oriented approach to mathematics and modelling in general. Again, more on that later. For now, let’s just say that (despite everything we said above) there are approaches to set theory that ignore all the warning signs to boldly go where no rigorous mathematician is supposed to go.
Because of their logical consistency, axiomatic approaches are bundled under the name of well-founded set theory. Accordingly, there is also non-well-founded set theory (yes, the name is awkward), which deliberately relaxes standards and allows some of the problematic cases of self-referentiality. One example is hyperset theory. It scraps the axiom of foundation (which says that a legitimate non-empty set must have an element disjoint from itself, preventing infinite regress), and replaces it with the provocatively named axiom of anti-foundation, which is much more liberal, stating that any set that can be drawn as a consistent directed graph (basically a bunch of pointed arrows) is a valid set. Correspondingly, the following are all okay
and illustrate nicely how you draw a set (even a hierarchical multi-level one, or one with loops) as a graph. But, then, the following little monster (called the ouroboros set, after the mythical snake that eats its own tail) is also legitimate
its only element being itself! Hyperset theory tells us that, within its own framework, we can work with this very strange set. But we have to be careful, because it can lead to all kinds of trouble down the line, like the mapping W = W(W), which follows from the ouroboros set, but (as we have seen) offends against the prohibition of (viciously) circular reasoning. So we have to tread carefully here.
It’s a bit like that warning you get when a website does not have a valid security certificate: you can proceed, but only at your own risk, and even though you may get useful — and sometimes even essential and unique — information from that website, it may also turn out to be a dodgy haunt that feeds you misinformation, phishes for your password, or infects your computer with a virus. The same with non-well founded set theory: you are out in the wild, on your own, and will have to judge for yourself what kind of inferences drawn from it are legitimate in any given context, and which lead to chaos and pandemonium. We’ll provide some examples in the last two parts of the book.
This is really all we need to know about sets and hypersets for the moment. We’ve covered quite some ground already in this appendix. But there is one last limitation of set theory that we must still mention, because it will be important later on: traditional set theories (both well-founded and not) are all pretty black and white. An element is either a member of a set, or it is not. There is no room for uncertainty, while in real-world situations, we cannot always be so sure. This is why there is a further extension to the theory called uncertain or fuzzy sets. The elements of these sets only belong to them with a certain probability. They are maybe-members, if you want. And just as traditional sets allow us to formalise logical operators, fuzzy sets lead to their own kind of logic, which is (not surprisingly) called fuzzy logic. Its main feature is that it does not adhere to the law of the excluded middle. In other words, elements can belong to two disjoint sets at the same time, or propositions can be both true and false, with a certain probability. We don’t want to say much more about such probabilistic approaches right now, but we’ll revisit the topic of stochasticity (and its potential sources in natural systems) at a later point in the book.
It’s a bit like that warning you get when a website does not have a valid security certificate: you can proceed, but only at your own risk, and even though you may get useful — and sometimes even essential and unique — information from that website, it may also turn out to be a dodgy haunt that feeds you misinformation, phishes for your password, or infects your computer with a virus. The same with non-well founded set theory: you are out in the wild, on your own, and will have to judge for yourself what kind of inferences drawn from it are legitimate in any given context, and which lead to chaos and pandemonium. We’ll provide some examples in the last two parts of the book.
This is really all we need to know about sets and hypersets for the moment. We’ve covered quite some ground already in this appendix. But there is one last limitation of set theory that we must still mention, because it will be important later on: traditional set theories (both well-founded and not) are all pretty black and white. An element is either a member of a set, or it is not. There is no room for uncertainty, while in real-world situations, we cannot always be so sure. This is why there is a further extension to the theory called uncertain or fuzzy sets. The elements of these sets only belong to them with a certain probability. They are maybe-members, if you want. And just as traditional sets allow us to formalise logical operators, fuzzy sets lead to their own kind of logic, which is (not surprisingly) called fuzzy logic. Its main feature is that it does not adhere to the law of the excluded middle. In other words, elements can belong to two disjoint sets at the same time, or propositions can be both true and false, with a certain probability. We don’t want to say much more about such probabilistic approaches right now, but we’ll revisit the topic of stochasticity (and its potential sources in natural systems) at a later point in the book.
Previous: Does It Compute?
|
Next: Limitations of Mathematical Modelling
|
The authors acknowledge funding from the John Templeton Foundation (Project ID: 62581), and would like to thank the co-leader of the project, Prof. Tarja Knuuttila, and the Department of Philosophy at the University of Vienna for hosting the project of which this book is a central part.
Disclaimer: everything we write and present here is our own responsibility. All mistakes are ours, and not the funders’ or our hosts’ and collaborators'.
Disclaimer: everything we write and present here is our own responsibility. All mistakes are ours, and not the funders’ or our hosts’ and collaborators'.