In this chapter we provide a treatment of the elementary concepts of set theory, with the goal of being able to use sets in later parts of the course, for example to define relations and functions. We will be more precise than the typical (very informal) treatment of set theory in highschool, but we will also avoid the intricacies of a full-fledged axiomatic treatment of set theory, showing only minor parts of it.
There seems to be no simpler mathematical concept than a set[1], a collection of objects. Although intuitively used for a long time,[2] the formulation of a set as a mathematical concept happened as late as the end of the 19th century. For example, in 1895 Cantor proposed the following wording: "Unter einer 'Menge' verstehen wir jede Zusammenfassung $M$ von bestimmten wohlunterschiedenen Objekten unserer Anschauung oder unseres Denkens (welche die 'Elemente' von $M$ genannt werden) zu einem Ganzen".
The reader is certainly familiar with statements like
$5 ∈ ℕ$ (where $ℕ$ denotes the set of natural numbers),
$-3 \not∈ ℕ$,
$\{3, 5, 7\} ⊆ ℕ$, and
$\{a, b\} ∪ \{b, c\} = \{a, b, c\}$,
as well as with simple definitions like the following:
Definition 3.1 (Informal)
The number of elements of a finite set $A$ is called its cardinality and is denoted $|A|$.
Also, facts like
$$A\subseteq B\,\land\,B\subseteq C\implies A\subseteq C $$
or
$$A\cap(B\cup C)=(A\cap B)\cup(A\cap C) $$
are well-known and seem obvious if one draws a figure with intersecting circles representing sets (so-called Venn-diagrams). However, many issues do not seem to be clear mathematically, for example:
Which objects can one use as elements of a set?
Can a set itself be an element of a set?
Can a set be an element of itself?
How is set intersection or set union defined?
How should the elements of a set be counted?
Do the above-stated facts require a proof, or are they just "obvious" in an informal sense?
This calls for a precise mathematical treatment with clear definitions, lemmas, and proofs. The need for such a precise treatment also becomes obvious when considering Russell's paradox discussed below.
The set concept introduced by Cantor and axiomatized further by Frege seemed very innocent and safe to work with. But in 1903, Bertrand Russell[3] (1872-1970) showed that set theory as understood at that point in time is inherently contradictory. This came as a shock to the mathematics community. As a consequence, set theory had to be based on much more rigorous grounds, on an axiomatic foundation, a process started by Ernst Zermelo. It is still an active area of research in mathematics which axioms can and should be used as the foundation of set theory. The most widely considered set of axioms is called Zermelo-Fraenkel (ZF) set theory. Axiomatic set theory is beyond the scope of this course.
The problem with Cantor's intuitive set theory is that, because it was not clearly axiomatized, it makes the following apparently innocent (yet false) assumption. Whenever one specifies a precise condition (i.e., a logical predicate $P$), allowing to distinguish between objects that satisfy the predicate and objects that don't, then $\{x | P(x)\}$, the set of objects satisfying the predicate is well-defined. Russell proposed the set
$$R=\{A\mid\ A\notin A\} $$
of sets that are not elements of themselves. Note that there seems to be nothing wrong with a set being an element of itself. For example, the set of sets containing at least 10 elements seems to be an element of itself, as it contains more than 10 elements. Similarly, the set of sets containing at most 10 elements is not an element of itself.
Either $R ∈ R$ or $R \not∈ R$. If $R ∈ R$, then by the definition of $R$, this implies $R \not∈ R$, a contradiction. Thus the only alternative is $R \not∈ R$. But again, by the definition of $R$, this implies $R ∈ R$, again a contradiction. In other words, $R ∈ R$ if and only if $R \not∈ R$, a paradox that requires an explanation.
The problem, subsequently addressed by Zermelo's axiomatization, is the following: While for any set $B$ and predicate $P$, $\{x ∈ B | P(x)\}$ is actually a well-defined set, $\{x | P(x)\}$ is not. We must have a set to begin with before being able to create new sets by selecting the elements satisfying a certain condition. In other words, the universe $U$ of objects one has in mind when writing $\{x | P(x)\}$ is not itself a set.[4]
In set theory one postulates that there is a universe of possible sets and a universe of objects which can be elements of sets. Nothing prevents us from thinking that the two universes are the same, i.e., the elements of sets are also sets. We further postulate a binary predicate $E$ to be given, and if $E(x, y) = 1$ we say that $x$is an element of$y$. We can call $E$ the elementhood predicate. Instead of the predicate $E$ we use an infix notation and write $x ∈ y$ rather than $E(x, y) = 1$. We also use the shorthand $x \not∈ y$ for $¬(x ∈ y)$, i.e., if $x$ is not an element of $y$.
Now we can postulate certain properties that the elementhood predicate $E$ should satisfy, capturing the essence of set theory. This makes explicit that $E$ is not some arbitrary predicate, but that it really captures natural properties of sets. In a systematic mathematical approach, one carefully chooses a list of axioms and develops a theory (set theory) based on these axioms. There are indeed several different (but related) axiom systems for set theory, and it is beyond the scope of this course to discuss set theory in a formal sense.[5] However, we will informally introduce some of these properties/axioms in order to arrive at a sufficiently precise treatment of sets.
When writing formulas, it will often be convenient to not only use the usual logical variable symbols $x$, $y$, etc., but to use in the same formula symbols like $A$, $B$, etc. This is convenient because it makes the formulas look closer to how set theory is usually informally discussed. However, whether we use the symbol $x$ or $A$ for a set is not mathematically relevant.
3.2.2. Set Equality and Constructing Sets From Sets
A set is completely specified by its elements, regardless of how it is described.[6] There is no other relevant information about a set than what its elements are. In other words, two sets $A$ and $B$ are equal ($A = B$) if (and only if) they contain the same elements, independently of how $A$ and $B$ are described. In other words, there can not be two different sets A and B which contain exactly the same elements. This is called the axiom of extensionality in set theory. Since we are not aiming for an axiomatic treatment of set theory, we state this simply as a definition.
We postulate[7] that if $a$ is a set, then the set containing exactly (and only) $a$ exists, and is usually denoted as $\{a\}$. Similarly, for any finite list of sets, say $a$, $b$, and $c$, the set containing exactly these elements exists and is usually denoted as $\{a, b, c\}$.
Since a set is specified by its elements, we can conclude that if two sets, each containing a single element, are equal, then the elements are equal. This can be stated as a lemma (in set theory), and it actually requires a proof.
Lemma 3.1
For any (sets) $a$ and $b$, $\{a\}=\{b\}\implies a=b$ .
Proof: Consider any fixed $a$ and $b$. The statement is an implication, which we prove indirectly. Assume that $a ≠ b$. Then $\{a\} ≠ \{b\}$ because there exists an element, namely $a$, that is contained in the first set, but not in the second. Thus we have proved that $a ≠ b \implies \{a\} ≠ \{b\}$. According to Definition 2.14, this proves $\{a\} = \{b\} \implies a = b$.
Note that, in contrast, $\{a, b\} = \{c, d\}$ neither implies that $a = c$ nor that $b = d$. In a set, say $\{a, b\}$, there is no order of the elements, i.e.,
$$\{a,b\}=\{b,a\}. $$
However, in mathematics one wants to also define the concept of an (ordered) list of objects. Let us consider the special case of ordered pairs. For the operation of forming an ordered pair of two objects $a$ and $b$, denoted $(a, b)$, we define
This example shows that one can model ordered pairs by using only (unordered) sets?[8] This means that the sets corresponding to two ordered pairs must be equal if and only if the ordered pairs are equal. A first approach is to define $(a, b) \stackrel{_{\text{def}}}{=} \{a, \{b\}\}$. However, this definition of an ordered pair fails because one could not distinguish whether the set $\{\{b\}, \{c\}\}$ denotes the ordered pair $(\{b\}, c)$ or the ordered pair $(\{c\}, b)$. The reader can verify as an exercise that the following definition is correct:
The following lemma states an alternative way for capturing the equality of sets, via the subset relation. In fact, this is often the best way to prove that two sets are equal.
Lemma 3.2
$A = B \iff (A ⊆ B) ∧ (B ⊆ A)$.
Proof: The proof first makes use (twice) of Definition 3.3, then uses the fact from predicate logic that $∀F~∧~∀G ≡ ∀(F ∧ G)$, then uses the fact from propositional logic that $(C → D)∧(D → C) ≡ C ↔ D$, [9] and then makes use of Definitions 3.2. For any sets $A$ and $B$ we have the following equivalences of statements about $A$ and $B$:
$$\begin{align*} (A\subseteq B) \land (B \subseteq A) &\dot\iff \forall x\ (x\in A\to x\in B)\ \land\ \forall x\ (x\in B\to x\in A) \\ &\dot\iff \forall x\ \big{(}(x\in A\to x\in B)\ \land\ (x\in B\to x\in A)\big{)} \\ &\dot\iff \forall x\ (x\in A\leftrightarrow x\in B) \\ &\dot\iff A = B \end{align*} $$
This concludes the proof. $\Box$
The next lemma states that the subset relation is transitive (a term discussed later). The proof is left as an exercise.
The above definition can be extended from two to several sets, i.e., to a set (or collection) of sets. Let $\mathcal A$ be a non-empty set of sets, with finite or infinite cardinality. The only restriction on $\mathcal A$ is that its elements must be sets. Then we define the union of all sets in $\mathcal A$ as the set of all $x$ that are an element of at least one of the sets in $\mathcal A$:
Then we have $\bigcup{\mathcal A} = \{a, b, c, d, e, f\}$ and $\bigcap{\mathcal A} = \{a, c\}$.
Typically, the sets (elements) in a set $\mathcal A$ of sets are indexed by some index set $I: \mathcal{A} = \{~A_i~ |~ i ∈ I\}$. In this case, one also writes $\{A_i\}_{i∈I}$, and for the intersection and union one writes $\bigcap_{i∈I}{A_i}$ and $\bigcup_{i∈I}{A_i}$, respectively.
Definition 3.5 Difference
The difference of sets $B$ and $A$, denoted $B\backslash A$ is the set of elements of $B$ without those that are elements of $A$:
$$B\setminus A {\stackrel{\mathrm{def}}{=}} \{~x\in B~|~x\notin A\}. $$
Since union and intersection are defined by logical operations on set membership expressions (e.g. $a ∈ A$), these set operations satisfy the corresponding statements of Lemma 2.1. This is stated as a theorem:
Theorem 3.4
For any sets $A$, $B$, and$C$, the following laws hold:
Idempotence:
$A ∩ A = A$; $A ∪ A = A$;
Commutativity:
$A ∩ B = B ∩ A$; $A ∪ B = B ∪ A$;
Associativity:
$A ∩ (B ∩ C) = (A ∩ B) ∩ C$; $A ∪ (B ∪ C) = (A ∪ B) ∪ C$;
Absorption:
$A ∩ (A ∪ B) = A$; $A ∪ (A ∩ B) = A$;
Distributivity:
$A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)$; $A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)$;
Consistency:
$A ⊆ B \iff A ∩ B = A \iff A ∪ B = B$.
Proof: The proof is straight-forward and exploits the related laws for logical operations. For example, the two associative laws are implied by the associativity of the logical AND and OR, respectively. The proof is left as an exercise. $\Box$
A set A is called empty if it contains no elements, i.e., if $∀x~¬(x ∈ A)$.
Lemma 3.5
There is only one empty set (which is often denoted as$∅$ or $\{\}$).[10]
Proof: Let $∅$ and $∅^′$ both be arbitrary empty sets. Since both are empty, every element that is in $∅$ (namely none) is also in $∅^′$ , and vice versa. This means according to Definition 3.2 that $∅ = ∅^′$ , which means that there is only one empty set. $\Box$
Lemma 3.6
The empty set is a subset of every set, i.e.,$∀A~(∅ ⊆ A)$
Proof: The proof is by contradiction: Assume that there is a set $A$ for which $∅ \not⊆ A$. This means that there exists an $x$ for which $x ∈ ∅$ but $x \not∈ A$. But such an $x$ cannot exist because $∅$ contains no element, which is a contradiction. $\Box$
Alternative Proof: The above is a valid proof. Just to illustrate (as an example) that the same proof could be made more formal and more precise we can write the proof as follows, making use of logical transformation rules for formulas with quantifiers. Let $A$ be an arbitrary (but fixed) set. The proof is by contradiction (see Definition 2.17), where the statement $S$ to be proved is $∅ ⊆ A$ and as the statement $T$ we choose ¬$∀x~(x \not∈ ∅)$, which is false because it is the negation of the definition of $∅$. The proof that the negation of $S$ implies $T$ (step 3 in Definition 2.17) is as follows:
$$\begin{align*} \neg (\emptyset \subseteq A) &\dot\iff \neg \forall x (x \in \emptyset \to x \in A) && \text{(def. of } \emptyset \subseteq A\text{)} \\ &\dot\iff \exists x \, \neg (x \in \emptyset \to x \in A) && (\neg \forall x F \equiv \exists x \neg F) \\ &\dot\iff \exists x \, \neg (\neg (x \in \emptyset) \lor x \in A) && (\text{def. of } \to) \\ &\dot\iff \exists x (x \in \emptyset \land \neg (x \in A)) && (\text{de Morgan's rule}) \\ &\dot\implies \exists x (x \in \emptyset) \land \exists x \, \neg (x \in A) && (\exists x (F \land G) \models (\exists x F) \land (\exists x G)) \\ &\dot\implies \exists x (x \in \emptyset) && (F \land G \text{ implies } F) \\ &\dot\iff \neg \forall x \, \neg (x \in \emptyset) && (\neg \forall x F \equiv \exists x \neg F) \\ &\dot\iff \emptyset \text{ is not the empty set} && (\text{Definition 3.6}) \\ \end{align*} $$
which is false, and hence we have arrived at a contradiction. $\Box$
At this point, the only set we know to exist, because we have postulated it, is the empty set. We can hence construct new sets $∅$. The set $\{∅\}$ is a set with a single element (namely $∅$). It is important to note that $\{∅\}$ is not the empty set $∅$, i.e., $\{∅\} ≠ ∅$. Note that $|\{∅\}| = 1$ while $|∅| = 0$. One can thus define a whole sequence of such sets:
We briefly discuss a way to define the natural numbers from basic set theory. We use bold-face font to denote objects that we define here as special sets, and then can observe that they can be seen as corresponding to the natural numbers with the same symbol (but written in non-bold font). We define the sets
For example, we have $\mathbf 1 = s(\mathbf 0)$ and $\mathbf 2 = s(\mathbf 1)$. We note that $|\mathbf 0| = 0$, $|\mathbf 1| = 1$, $|\mathbf 2| = 2$, $|\mathbf 3| = 3$, …, and, more generally, that $|s(\mathbf n)| = |\mathbf n| + 1$.
An operation $\boldsymbol{+}$ on these sets $\mathbf 0$, $\mathbf 1$, $\mathbf 2$, $\mathbf 3$, . . ., which corresponds to addition of numbers, can be defined recursively as follows:
$$\begin{array}{l l l l l}{{\bf m}{\boldsymbol{+}}{\bf0}}&{{\stackrel{\mathrm{def}}{=}}}&{{\bf m}}&{{}}&{{}}&{{\mathrm{and}}}&{{}}&{{}}&{{\bf m}{\bf+}s({\bf n})}&{{\stackrel{\mathrm{def}}{=}}}&{{s(\bf{m+n})}}\end{array} $$
One can also define a multiplication operation and prove that these operations satisfy the usual laws of the natural numbers (commutative, associative, and distributive laws).
More generally, we denote an (ordered) list of $k$ objects $a_1, …, a_k$ as $(a_1, …, a_k)$. Two lists of the same length are equal if they agree in every component.
Definition 3.8 Cartesian Product
The Cartesian product$A × B$ of two sets $A$ and $B$ is the set of all ordered pairs with the first component from $A$ and the second component from $B$:
$A\times B=\{~(a,b)~|~ a\in A~\land~b\in B\}$.
For finite sets, the cardinality of the Cartesian product of some sets is the product of their cardinalities: $|A × B| = |A| · |B|$.
Example 3.7
Prove or disprove the following statements:
$\emptyset\times A=\emptyset$.
$A × B = B × A$.
More generally, the Cartesian product of $k$ sets $A_1, …, A_k$ is the set of all lists of length $k$ (also called $k$-tuples) with the $i$-th component from $A_i$:
Relations are a fundamental concept in discrete mathematics and Computer Science. Many special types of relations (e.g., equivalence relations, order relations, and lattices) capture concepts naturally arising in applications. Functions are also a special type of relation.
A (binary) relation$ρ$from a set$A$to a set$B$ (also called an $(A, B)$relation) is a subset of $A × B$. If $A = B$, then $ρ$ is called a relation on $A$.
Instead of $(a, b) ∈ ρ$ one usually writes
$a \operatorname\rho b$,
and sometimes we write $a \!\not\!\rho\; b$ if $(a, b) \not∈ ρ$.
Example 3.8
Let $S$ be the set of students at ETH and let $C$ be the set of courses taught at ETH. Then a natural relation from $S$ to $C$ is the "takes" relation. If $s ∈ S$ is a student and $c ∈ C$ is a course, then $(s, c)$ is in the relation if and only if s takes course $c$. If we denote the relation by $\mathtt{takes}$, we can write $(s, c) ∈ \mathtt{takes}$ or $s~\mathtt{takes}~ y$.[12] We can also consider the set $P$ of professors at ETH and the natural relation from $P$ to $C$.
Example 3.9
Let $H$ be the set of all human beings (alive or dead). Then "child of" is a relation on $H$. If we denote the relation by $\mathtt{childof}$, then $(x, y) ∈~\mathtt{childof}$ (or equivalently $x~\mathtt{childof}~y$) means that $x$ is $y$'s child. Other relations on $H$ are "parent of", "grandparent of", "cousin of", "ancestor of", "married to", etc.
Example 3.10
On the integers $ℤ$ we already know a number of very natural relations: $=$, $≠$, $≤$, $≥$, $<$, $>$, $|$ (the 'divides' relation), and $\mathord{\not}\,|$ (does not divide).
Example 3.11
The relation $≡_m$ on $ℤ$ is defined as follows:
$a\equiv_{m}b\quad{\stackrel{\mathrm{def}}{\iff}}\quad a-b=k m$ for some $k$,
i.e., $a ≡_m b$ if and only if $a$ and $b$ have the same remainder when divided by $m$. (See Section 4.2.)
Example 3.12
The relation $\{(x, y)~|~x^2 + y^2 = 1\}$ on $ℝ$ is the set of points on the unit circle, which is a subset of $ℝ × ℝ$.
Example 3.13
For any set $S$, the subset relation ($⊆$) is a relation on $\cf P(S)$.
Example 3.14
Two special relations from $A$ to $B$ are the empty relation (i.e., the empty set $∅$) and the complete relation $A × B$ consisting of all pairs $(a, b)$.
Definition 3.10
For any set $A$, the identity relation on $A$, denoted $\mathsf{id}_A$ (or simply $\mathsf{id}$), is the relation $\mathsf{id}_A = \{(a, a)~|~a ∈ A\}$.
Relations on a finite set are of special interest. There are $2^{n^2}$ different relations on a set of cardinality $n$. (Why?)
The relation concept can be generalized from binary to $k$-ary relations for given sets $A_1, …, A_k$. A $k$-ary relation is a subset of $A_1× \cdots ×A_k$. Such relations play an important role in modeling relational databases. Here we only consider binary relations.
For finite sets $A$ and $B$, a (binary) relation $ρ$ from $A$ to $B$ can be represented as a Boolean $|A|×|B|$ matrix $M^ρ$ with the rows and columns labeled by the elements of $A$ and $B$, respectively. For $a ∈ A$ and $b ∈ B$, the matrix entry $M^ρ_{a,b}$ is $1$ if $a ρ b$, and $0$ otherwise.
Example 3.15
Let $A = \{a, b, c, d\}$ and $B = \{q, r, s, t, u\}$, and consider the relation $ρ = \{(a, r),(a, s),(a, u),(b, q),(b, s),(c, r),(c, t),(c, u),(d, s),(d, u)\}$. The matrix representation is
$$M^{\rho} = \begin{array}{c} \begin{matrix} & q & r & s & t & u\; \end{matrix} \\ \begin{matrix} a \\ b \\ c \\ d \end{matrix} \left[ \begin{matrix} \;0 & 1 & 1 & 0 & 1\; \\ \;1 & 0 & 1 & 0 & 0\; \\ \;0 & 1 & 0 & 1 & 1\; \\ \;0 & 0 & 1 & 0 & 1\; \end{matrix} \right] \end{array} $$
where the rows and columns are labeled by the elements of $A$ and $B$, respectively.
For relations on a set $A$, the matrix is an $|A| × |A|$ square matrix.
Example 3.16
For the set $A = \{1, 2, 3, 4, 5\}$, the relations $=$, $≥$, and $≤$ correspond to the identity matrix,[13] the lower triangular matrix, and the upper triangular matrix, respectively.
An alternative representation of a relation $ρ$ from $A$ to $B$ is by a directed graph with $|A| + |B|$ vertices[14] labeled by the elements of $A$ and $B$. The graph contains the edge[15] from $a$ to $b$ if and only if $a \operatorname\rho b$. For a relation on a set $A$, the graph contains only $|A|$ vertices, but it can contain loops (edges from a vertex to itself).
Relations from $A$ to $B$ are sets, and therefore we can apply any operation defined on sets: union, intersection, and complement. In the matrix representation of relations, these operations correspond to the position-wise logical $\text{OR}$, $\text{AND}$, and negation, respectively. A relation can also be a subset of another relation.
Example 3.17
On the set $ℤ$, the relation $≤ ∪ ≥$ is the complete relation, $≤ ∩ ≥$ is the identity relation, and the complement of $≤$ is the relation $>$. Moreover, we have $<\operatorname ⊆ ≤$ and $=\operatorname ⊆ ≥$.
Example 3.18
For any relatively prime integers $m$ and $n$, the relation $≡_m \operatorname ∩ ≡_n$ is $≡_{mn}$, as will be shown in Chapter 4. More generally, For general $m$ and $n$, the relation $≡_m ∩ ≡_n$ is $≡_{\operatorname{lcm}(m,n)}$, where $\operatorname{lcm}(m, n)$ denotes the least common multiple of $m$ and $n$.
The inverse of a relation $ρ$ from $A$ to $B$ is the relation $\hat ρ$ from $B$ to $A$ defined by
$\widehat ρ\ \stackrel{\text{def}}{=}\ \{(b, a)~|~(a, b) ∈ ρ\}$.
Note that for all $a$ and $b$ we have $b \operatorname{\widehat\rho} a \iff a \operatorname\rho b$. An alternative notation for the inverse of $ρ$ is $ρ^{ −1}$ .
Example 3.19
Let $H$ be the set of people, $O$ the set of objects, and $π$ the ownership relation from $H$ to $O$. The inverse $\widehat π$ is the "owned by" relation determining who owns an object.
Example 3.20
If $\phi$ is the parenthood relation on the set $H$ of humans (i.e., $a \operatorname\phi b$ if $a$ is a parent of $b$), then the inverse relation $\widehat\phi$ is the childhood relation.
Example 3.21
On the set $ℤ$, the inverse of the relation $≤$ is the relation $≥$. The inverse of $\mathsf{id}$ is again $\mathsf{id}$.
In the matrix representation, taking the inverse of a relation corresponds to the transposition of the matrix. In the graph representation, taking the inverse corresponds to inverting the direction of all edges.
Let $ρ$ be a relation from $A$ to $B$ and let $σ$ be a relation from $B$ to $C$. Then the composition of $ρ$ and $σ$, denoted ◦$ρ ◦ σ$ (or also $ρσ$), is the relation from $A$ to $C$ defined by
The $n$-fold composition of a relation $ρ$ on a set $A$ with itself is denoted $ρ^n$.
Lemma 3.7
The composition of relations is associative, i.e., we have◦◦◦◦$ρ ◦ (σ ◦ \phi) = (ρ ◦ σ) ◦ \phi$.
Proof: We use the short notation $ρσ$ instead of ◦$ρ ◦ σ$. The claim of the lemma, $ρ(σ\phi) = (ρσ)\phi$, states an equality of sets, which can be proved by proving that each set is contained in the other (see Section 3.2.3). We prove $ρ(σ\phi) ⊆ (ρσ)\phi$; the other inclusion is proved analogously. Suppose that $(a, d) ∈ ρ(σ\phi)$. We need to prove that $(a, d) ∈ (ρσ)\phi$. For illustrative purposes, We provide two formulations of this proof, first as a text and then in logical formulas.
Because $(a, d) ∈ ρ(σ\phi)$, there exists $b$ such that $(a, b) ∈ ρ$ and $(b, d) ∈ σ\phi$. The latter implies that there exists $c$ such that $(b, c) ∈ σ$ and $(c, d) ∈ \phi$. Now, $(a, b) ∈ ρ$ and $(b, c) ∈ σ$ imply that $(a, c) ∈ ρσ$, which together with $(c, d) ∈ \phi$ implies $(a, d) ∈ (ρσ)\phi$.
Now comes the more formal version of the same proof, where the justification for each step is omitted:[16]
$$\begin{align*} (a, d) \in \rho (\sigma \phi) &\dot\implies \exists b \, ((a, b) \in \rho \land (b, d) \in \sigma \phi) \\[2pt] &\dot\implies \exists b \, ((a, b) \in \rho \land \exists c \, ((b, c) \in \sigma \land (c, d) \in \phi)) \\[2pt] &\dot\implies \exists b \exists c \, ((a, b) \in \rho \land ((b, c) \in \sigma \land (c, d) \in \phi)) \\[2pt] &\dot\implies \exists b \exists c \, (((a, b) \in \rho \land (b, c) \in \sigma) \land (c, d) \in \phi) \\[2pt] &\dot\implies \exists c \exists b \, (((a, b) \in \rho \land (b, c) \in \sigma) \land (c, d) \in \phi) \\[2pt] &\dot\implies \exists c \, (\exists b \, ((a, b) \in \rho \land (b, c) \in \sigma) \land (c, d) \in \phi) \\[2pt] &\dot\implies \exists c \, ((a, c) \in \rho \sigma \land (c, d) \in \phi) \\[2pt] &\dot\implies (a, d) \in (\rho \sigma) \phi. \end{align*} $$
Both formulations prove the lemma. $\Box$
Example 3.22
Consider the ownership relation $π$ and the parenthood relation $\phi$ as above. Then the relation $\phiπ$ from $H$ to $O$ can be interpreted as follows: $a \operatorname{\phi\pi} b$ if and only if person $a$ has a child who owns object $b$.
Example 3.23
If $\phi$ is the parenthood relation on the set $H$ of humans, then the relation $\phi^2$ is the grand-parenthood relation.[17]
In the matrix representation, composing two relations corresponds to a special type of matrix multiplication. If the matrices are considered as integer matrices (with $0$ and $1$ entries), then after the multiplication all entries $> 1$ are set to $1$. [18] In the graph representation the composition corresponds to the natural composition of the graphs, where $a \operatorname{\rho\sigma} c$ if and only if there is a path from $a$ (over some $b$) to $c$.
The proof of the following lemma is left as an exercise.
Lemma 3.8
Let $ρ$be a relation from$A$ to $B$and let$σ$be a relation from$B$ to $C$. Then the inverse$\widehat{ρσ}$ of $ρσ$is the relation$\widehat\sigma\widehat\rho$.
A relation $ρ$ on a set $A$ is called reflexive if
$$a \operatorname\rho a $$
is true for all $a ∈ A$, i.e., if
$\mathsf{id}\subseteq\rho$.
In other words, a relation is reflexive if it contains the identity relation $\mathsf{id}$. In the matrix representation of relations, reflexive means that the diagonal is all $1$. In a graph representation, reflexive means that every vertex has a loop (an edge from the vertex to itself).
Example 3.24
The relations $≤$, $≥$, and $|$ (divides) on $ℤ\backslash\{0\}$ are reflexive, but the relations $<$ and $>$ are not.
Definition 3.14 Irreflexive Relation
A relation $ρ$ on a set $A$ is called irreflexive if $a \operatorname{\not\!\rho} a$ for all $a ∈ A$, i.e., if $ρ ∩ \mathsf{id} = ∅$. [19]
Definition 3.15 Symmetric Relation
A relation $ρ$ on a set $A$ is called symmetric if
$$a \operatorname\rho b\ \iff\ b\operatorname\rho a $$
is true for all $a, b ∈ A$, i.e., if
$ρ = \widehat\rho$.
In the matrix representation of relations, symmetric means that the matrix is symmetric (with respect to the diagonal).
A symmetric relation ρ on a set A can be represented as an undirected graph, possibly with loops from a node to itself.
Example 3.25
The relation $≡_m$ on $ℤ$ is symmetric.
Example 3.26
The "married to" relation on the set $H$ of humans is symmetric.
Definition 3.16 Antisymmetric Relation
A relation $ρ$ on a set $A$ is called antisymmetric if
$$a \operatorname\rho b \land b \operatorname\rho a\ \implies\ a = b $$
is true for all $a, b ∈ A$, i.e., if
$ρ ∩ \widehat ρ ⊆ \mathsf{id}$.
In a graph representation, antisymmetric means that there is no cycle of length $2$, i.e., no distinct vertices $a$ and $b$ with edges both from $a$ to $b$ and from $b$ to $a$. Note that antisymmetric is not the negation of symmetric.
Example 3.27
The relations $≤$ and $≥$ are antisymmetric, and so is the division relation $|$ on $ℕ$: If $a | b$ and $b | a$, then $a = b$. But note that the division relation $|$ on $ℤ$ is not antisymmetric. Why?
Definition 3.17 Transitive Relation
A relation $ρ$ on a set $A$ is called transitive if
$$a \operatorname\rho b \land b \operatorname\rho c\ \implies\ a \operatorname\rho c $$
is true for all $a, b, c ∈ A$.
Example 3.28
The relations $≤$, $≥$, $<$, $>$, $|$, and $≡_m$ on $ℤ$ are transitive.
Example 3.29
Let $ρ$ be the ancestor relation on the set $H$ of humans, i.e., $a \operatorname\phi b$ if $a$ is an ancestor of $b$. This relation is transitive.
Lemma 3.9
A relation$ρ$is transitive if and only if$ρ^2 ⊆ ρ$.
Proof: The "if" part of the theorem ($\impliedby$) follows from the definition of composition: If $a \operatorname\rho b$ and $b \operatorname\rho c$, then $a \operatorname\rho^2 c$. Therefore also $a \operatorname\rho c$ since $ρ^2 ⊆ ρ$. [20] This means transitivity.
Proof of the "only if" part ($\implies$): Assume that $ρ$ is transitive. To show that $ρ^2 ⊆ ρ$, assume that $a \operatorname\rho^2 b$ for some $a$ and $b$. We must prove that $a \operatorname\rho b$. The definition of $a \operatorname\rho^2 b$ states that there exists $c$ such that $a \operatorname\rho c$ and $c \operatorname\rho b$. Transitivity of $ρ$ thus implies that $a \operatorname\rho b$, which concludes the proof. $\Box$
In the graph representation of a relation $ρ$ on $A$, we have $a \operatorname\rho^k b$ if and only if there is a walk of length $k$ from $a$ to $b$ in the graph, where a walk may visit a node multiple times. The transitive closure is the reachability relation, i.e., $a \operatorname\rho^* b$ if and only if there is a path (of arbitrary finite length) from $a$ to $b$.
Example 3.30
Consider the set $P$ of all convex polygons. We can think of them as being given as pieces of paper. By cutting a piece into two pieces with a straight cut one can obtain new polygons. Let $\succeq$ be the relation defined as follows: $a \succeq b$ if and only if with a single straight-line cut (or no cut) one can obtain $b$ from $a$. Moreover, consider the covering relation $⊒$, where $a ⊒ b$ if and only if $a$ can completely cover $b$ (if appropriately positioned). It is easy to see that $⊒$ is reflexive, anti-symmetric, and transitive[21] whereas $\succeq$ is only reflexive and antisymmetric. Note that $⊒$ is the transitive closure of $\succeq$.
An equivalence relation is a relation on a set $A$ that is reflexive, symmetric, and transitive.
Example 3.31
The relation $≡_m$ is an equivalence relation on $ℤ$.
Definition 3.20 Equivalence Class
For an equivalence relation $θ$ on a set $A$ and for $a ∈ A$, the set of elements of $A$ that are equivalent to $a$ is called the equivalence class of$a$ and is denoted as $[a]_θ$: [22]
Two trivial equivalence relations on a set $A$ are the complete relation $A × A$, for which there is only one equivalence class $A$, and the identity relation for which the equivalence classes are all singletons[23]$\{a\}$ for $a ∈ A$.
Example 3.32
The equivalence classes of the relation $≡_3$ are the sets
Consider the set $ℝ^2$ of points $(x, y)$ in the plane, and consider the relation $ρ$ defined by $(x, y)~\operatorname\rho~(x^′, y^′ ) \iff x+y = x^\prime +y^′$. Clearly, this relation is reflexive, symmetric, and transitive. The equivalence classes are the set of lines in the plane parallel to the diagonal of the second quadrant.
The proof of the following theorem is left as an exercise.
Lemma 3.10
The intersection of two equivalence relations (on the same set) is an equivalence relation.
A partition of a set $A$ is a set of mutually disjoint subsets of $A$ that cover $A$, i.e., a set $\{S_i~|~i ∈ \cf I\}$ of sets $S_i$ (for some index set $\cf I$) satisfying
$S_{i}\cap S_{j}=\varnothing$ for $i\neq j$ and $\displaystyle\bigcup_{i \in \cf I} S_{i}=A$.
Consider any partition of a set $A$ and define the relation $≡$ such that two elements are $≡$-related if and only if they are in the same set of the partition. It is easy to see that this relation is an equivalence relation. The following theorem states that the converse also holds. In other words, partitions and equivalence relations capture the same (simple) abstraction.
Definition 3.22 Quotient Set
The set of equivalence classes of an equivalence relation $θ$, denoted by
$A/θ \stackrel{\text{def}}{=} \{[a]_θ~|~a ∈ A\}$,
is called the quotient set of$A$ by $θ$, or simply $A$modulo$θ$, or $A \operatorname{mod} θ$.
Theorem 3.11
The set$A/θ$of equivalence classes of an equivalence relation$θ$ on $A$ is a partition of$A$.
Proof: Since $a ∈ [a]$ for all $a ∈ A$ (reflexivity of $θ$), the union of all equivalence classes is equal to $A$. It remains to prove that equivalence classes are disjoint. This is proved by proving, for any fixed $a$ and $b$, that
To prove the first statement we consider an arbitrary $c ∈ [a]$ and observe that
$$\begin{align*} c \in [a] &\dot\iff c \, \theta \, a && \text{(def. of } [a]\text{)} \\ &\;\dot\implies c \, \theta \, b && \text{(use } a \, \theta \, b \text{ and transitivity)} \\ &\dot\iff c \in [b] && \text{(def. of } [b]\text{)} \end{align*} $$
Note that $c ∈ [a] \implies c ∈ [b]$ (for all $c ∈ A$) is the definition of $[a] ⊆ [b]$. The statement $[b] ⊆ [a]$ is proved analogously but additionally requires the application of symmetry. (This is an exercise.) Together this implies $[a] = [b]$.
The second statement is proved by contradiction. Suppose it is false[24], i.e., $a \operatorname{\not\!\theta} b$ and $[a] ∩ [b] ≠ ∅$, i.e., there exists some $c ∈ [a] ∩ [b]$, which means that $c \operatorname\theta a$ and $c \operatorname\theta b$. By symmetry we have $a \operatorname\theta c$ and thus, by transitivity, we have $a \operatorname\theta b$, a contradiction. (As an exercise, the reader can write this proof as a sequence of implications.) $\Box$
3.4.3. Example: Definition of the Rational Numbers
We consider the set $A = ℤ × (ℤ\backslash\{0\})$ and define the equivalence relation $∼$ on $A$ as follows:
$(a,b)\sim(c,d) {\stackrel{\mathrm{def}}{\iff}} a d=b c$.
This relation is reflexive ($(a, b) ∼ (a, b)$ since $ab = ba$), symmetric (since $ad = bc \implies cb = da$), and transitive. For the latter, assume $(a, b) ∼ (c, d)$ and $(c, d) ∼ (e, f)$. Then $ad = bc$ and $cf = de$, and thus $adcf = bcde$. Canceling $d$ (which is $≠ 0$) gives
$acf = bce$.
We have to consider the cases $c ≠ 0$ and $c = 0$ separately. If $c ≠ 0$, then $c$ can be canceled, giving $af = be$. If $c = 0$, then $a = 0$ since $d ≠ 0$ but $ad = bc$. Similarly, $e = 0$, and thus we also have $af = be$. Therefore $∼$ is transitive and hence an equivalence relation.
To every equivalence class $[(a, b)]$ we can associate the rational number $a/b (b ≠ 0)$. It is easy to see that all pairs $(u, v) ∈ [(a, b)]$ correspond to the same rational number, and two distinct rational numbers correspond to distinct equivalence classes. Thus[25]
Taking the definition of an equivalence relation and simply replacing the symmetry condition by the anti-symmetry condition results in a completely different, but even more interesting type of relation.
Definition 3.23 Partial Order
A partial order (or simply an order relation[26]) on a set $A$ is a relation that is reflexive, antisymmetric, and transitive. A set $A$ together with a partial order $\preceq$ on $A$ is called a partially ordered set (or simply poset) and is denoted as $(A;\preceq)$. [27]
In a graph representation of relations, a partial order has no cycles (but this is of course not a complete characterization).
Example 3.35
The relations $≤$ and $≥$ are partial orders on $ℤ$, $ℚ$, or $ℝ$. The relations $<$ and $>$ are not partial orders because they are not reflexive (though they are both transitive and, in fact, also antisymmetric because $a < b ∧ b < a$ is never true, i.e., $< \operatorname ∩ \widehat< = ∅$).
Example 3.36
The division relation ($|$) is a partial order on $ℕ\backslash \{0\}$ or any subset of $ℕ\backslash \{0\}$.
Example 3.37
The subset relation on the power set of a set $A$ is a partial order. In other words, for any set $A$, $(\cf P(A); ⊆)$ is a poset.
Example 3.38
The covering relation on convex polygons (see Example 3.30) is a partial order.
For a partial order relation $\preceq$ we can define the relation $a ≺ b$ similar to how the relation $<$ is obtained from $≤$:
For a poset $(A;\preceq)$, two elements $a$ and $b$ are called comparable[28] if $a ⪯ b$ or $b ⪯ a$; otherwise they are called incomparable.
Definition 3.25 Total Order
If any two elements of a poset $(A;⪯)$ are comparable, then $A$ is called totally ordered (or linearly ordered) by $\preceq$.
Example 3.39
$(ℤ; ≤)$ and $(ℤ; ≥)$ are totally ordered posets (or simply totally ordered sets), and so is any subset of $ℤ$ with respect to $≤$ or $≥$. For instance, $(\{2, 5, 7, 10\}; ≤)$ is a totally ordered set.
Example 3.40
The poset $(\cf P(A); ⊆)$ is not totally ordered if $|A| ≥ 2$, nor is the poset $(ℕ; |)$.
In a poset $(A;⪯)$ an element $b$ is said to cover[29] an element $a$ if $a ≺ b$ and there exists no $c$ with $a ≺ c$ and $c ≺ b$ (i.e., between $a$ and $b$).
Example 3.41
In a hierarchy (say of a company), if $a ≺ b$ means that $b$ is superior to $a$, then $b$ covers $a$ means that $b$ is the direct superior of $a$.
Definition 3.27 Hasse Diagram
The Hasse diagram of a (finite) poset $(A; ⪯)$ is the directed graph whose vertices are labeled with the elements of $A$ and where there is an edge from $a$ to $b$ if and only if $b$ covers $a$.
The Hasse diagram is a graph with directed edges. It is usually drawn such that whenever $a ≺ b$, then $b$ is placed higher than $a$. This means that all arrows are directed upwards and therefore can be omitted.
Example 3.42
The Hasse diagram of the poset $(\{2, 3, 4, 5, 6, 7, 8, 9\}; |)$ is shown in Figure 3.1 on the left.
Example 3.43
A nicer diagram is obtained when $A$ is the set of all divisors of an integer $n$. The Hasse diagram of the poset $(\{1, 2, 3, 4, 6, 8, 12, 24\}; |)$ is shown in Figure 3.1 in the middle.
Example 3.44
The Hasse diagram of the poset $(\cf P(\{a, b, c\}); ⊆)$ is shown in Figure 3.1 on the right.
Example 3.45
For the two Hasse diagrams in Figure 3.2, give a realization as the divisibility poset of a set of integers.
Example 3.46
Consider the covering[30] relation on the convex polygons discussed in Example 3.30. A polygon $a$ is covered by a polygon $b$ if $b$ can be placed on top of $a$ such that $a$ disappears completely. Are there sets of six polygons resulting in a poset with the left (or right) Hasse diagram in Figure 3.2?
3.5.3. Combinations of Posets and the Lexicographic Order
Definition 3.28 Direct Product
The direct product of posets $(A;⪯)$ and $(B; ⊑)$, denoted $(A;⪯) × (B; ⊑)$, is the set $A × B$ with the relation $≤$ (on $A × B$) defined by
The proof is left as an exercise. The reader can verify that when replacing $∧$ by $∨$, the resulting relation is in general not a partial order relation.
A more interesting order relation on $A × B$ is defined in the following theorem, whose proof is left as an exercise.
Theorem 3.13 Lexicographic Order
For given posets$(A;⪯) and (B; ⊑)$, the relation$≤_{lex}$defined on$A×B$ by
The relation $≤_{lex}$ is the well-known lexicographic order of pairs, usually considered when both posets are identical. The lexicographic order $≤_{lex}$ is useful because if both $(A;⪯)$ and $(B; ⊑)$ are totally ordered (e.g. the alphabetical order of the letters), then so is the lexicographic order on $A × B$ (prove this!).
The lexicographic order can easily be generalized to the $k$-tuples over some alphabet $Σ$ (denoted $Σ^k$) and more generally to the set $Σ^*$ of finite-length strings over $Σ$. The fact that a total order on the alphabet results in a total order on $Σ^*$ is well-known: The telephone book has a total order on all entries.
We define a few types of special elements that a poset can have.
Definition 3.29 Minimal, Maximal, Least, and Greatest Elements
Let $(A;⪯)$ be a poset, and let $S ⊆ A$ be some subset of $A$. Then
$a ∈ A$ is a minimal (maximal) element of $A$ if there exists no $b ∈ A$ with $b ≺ a$ ($b ≻ a$).[32]
$a ∈ A$ is the least (greatest) element of $A$ if $a ⪯ b$ ($a ⪰ b$) for all $b ∈ A$. [33]
$a ∈ A$ is a lower (upper) bound[34] of $S$ if $a ⪯ b$ ($a ⪰ b$) for all $b ∈ S$. [35]
$a ∈ A$ is the greatest lower bound (least upper bound) of $S$ if $a$ is the greatest (least) element of the set of all lower (upper) bounds of $S$. [36]
Minimal, maximal, least, and greatest elements are easily identified in a Hasse diagram.
The greatest lower bound and the least upper bound of a set $S$ are sometimes denoted as $\operatorname{glb}(S)$ and $\operatorname{lub}(S)$, respectively.
Example 3.47
Consider the poset $(\{2, 3, 4, 5, 6, 7, 8, 9\}; |)$ shown in Figure 3.1. It has no least or greatest elements, but $2$, $3$, $5$, and $7$ are minimal elements, and $5$, $6$, $7$, $8$ and $9$ are maximal elements. The number $2$ is a lower bound (actually the greatest lower bound) of the subset $\{4, 6, 8\}$, and the subset $\{4, 9\}$ has no lower (nor upper) bound.
Example 3.48
The poset $(\{1, 2, 3, 4, 6, 8, 12, 24\}; |)$ shown in Figure 3.1 has both a least ($1$) and a greatest ($24$) element. The subset $\{8, 12\}$ has the three lower bounds $1$, $2$, and $4$, and $4$ is the greatest lower bound of $\{8, 12\}$. Actually, this poset is special in that any set of elements has a greatest lower bound and a least upper bound. How can $\operatorname{glb}(S)$ and $\operatorname{lub}(S)$ be defined?
Example 3.49
The poset $(\cf P(\{a, b, c\}); ⊆)$ shown in Figure 3.1 has both a least element, namely $∅$, and a greatest element, namely $\{a, b, c\}$.
Example 3.50
In the poset $(ℤ^+; |)$, $1$ is a least element but there is no greatest element.
Definition 3.30 Well-Ordered Poset
A poset $(A;⪯)$ is well-ordered[37] if it is totally ordered and if every non-empty subset of $A$ has a least element.[38]
Note that every totally ordered finite poset is well-ordered. The property of being well-ordered is of interest only for infinite posets. The natural numbers ℕ are well-ordered by $≤$. Any subset of the natural numbers is also well-ordered. More generally, any subset of a well-ordered set is well-ordered (by the same order relation).
Let $(A;⪯)$ be a poset. If $a$ and $b$ (i.e., the set $\{a, b\} ⊆ A$) have a greatest lower bound, then it is called the meet of $a$ and $b$, often denoted $a ∧ b$. If $a$ and $b$ have a least upper bound, then it is called the join of $a$ and $b$, often denoted $a ∨ b$.
Definition 3.32 Lattice
A poset $(A;⪯)$ in which every pair of elements has a meet and a join is called a lattice[39].
Example 3.51
The posets $(ℕ; ≤)$, $(ℕ \backslash \{0\}; | )$, and $(\cf P(S); ⊆)$ are lattices, as the reader can verify.
Example 3.52
The poset $(\{1, 2, 3, 4, 6, 8, 12, 24\}; |)$ shown in Figure 3.1 is a lattice. The meet of two elements is their greatest common divisor, and their join is the least common multiple. For example, $6 ∧ 8 = 2$, $6 ∨ 8 = 24$, $3 ∧ 4 = 1$, and $3 ∨ 4 = 12$. In contrast, the poset $(\{2, 3, 4, 5, 6, 7, 8, 9\}; |)$ is not a lattice.
The concept of a function is perhaps the second most fundamental concept in mathematics (after the concept of a set). We discuss functions only now, after having introduced relations, because functions are a special type of relation, and several concepts defined for relations (e.g. inversion and composition) apply to functions as well.
Definition 3.33 Function
A function$f : A \to B$ from a domain[40]$A$ to a codomain[41]$B$ is a relation from $A$ to $B$ with the special properties (using the relation notation $a\ f\ b$):[42]
$∀a ∈ A ∃b ∈ B a\ f\ b$(the function is totally defined),
$∀a ∈ A ∀b, b' ∈ B (a\ f\ b\ ∧\ a\ f\ b'\ \to b = b')$(the function is well-defined).
As the reader certainly knows, a function $f$ can be understood as a mapping from $A$ to $B$, assigning to every $a ∈ A$ a unique element in $B$, usually denoted as $f(a)$. One writes $f : A \to B$ to indicate the domain and codomain of $f$, and
$$f : a \mapsto \text{"expression in a"} $$
(e.g. $f : a \mapsto a^2$ or, equivalently, $f : x \mapsto x^2$) to define the function.
Definition 3.34 Function Set
The set of all functions $A \to B$ is denoted as $B^A$. [43]
One can generalize the function concept by dropping the first condition (totally defined), i.e., allowing that there can exist elements $a ∈ A$ for which $f(a)$ is not defined.
Definition 3.35 Partial Function
A partial function$A \to B$ is a relation from $A$ to $B$ such that condition 2. above holds.
Two (partial) functions with common domain $A$ and codomain $B$ are equal if they are equal as relations (i.e., as sets). $f = g$ is equivalent to saying that the function values of $f$ and $g$ agree for all arguments (including, in case of partial functions, whether or not it is defined).
Definition 3.36 Image
For a function $f : A → B$ and a subset $S$ of $A$, the image[44] of $S$ under $f$, denoted $f(S)$, is the set
The subset $f(A)$ of $B$ is called the image (or range) of $f$ and is also denoted $Im(f)$.
Example 3.53
Consider the function $f : ℝ → ℝ$ defined by $f(x) = x^2$. The image of the interval $[2, 3]$ is the interval $[4, 9]$. The range of $f$ is the set $ℝ^{≥0}$ of non-negative real numbers.
Definition 3.38 Preimage
For a subset $T$ of $B$, the preimage[45] of $T$, denoted $f^{−1}(T)$, is the set of values in $A$ that map into $T$:
Consider again the function $f(x) = x^2$. The preimage of the interval $[4, 9]$ is $[−3, −2] ∪ [2, 3]$.
Definition 3.39 Injective, Surjective, Bijective
A function $f : A → B$ is called
injective (or one-to-one or an injection) if for $a ≠ a^′$ we have $f(a) ≠ f(a^′)$, i.e., no two distinct values are mapped to the same function value (there are no "collisions").
surjective (or onto) if $f(A) = B$, i.e., if for every $b ∈ B$, $b = f(a)$ for some $a ∈ A$ (every value in the codomain is taken on for some argument).
bijective (or a bijection) if it is both injective and surjective.
Definition 3.40 Inverse Function
For a bijective function $f$, the inverse (as a relation, see Definition 3.11) is called the inverse function[46] of $f$, usually denoted as $f^{-1}$ .
Definition 3.41 Composition of Functions
The composition of a function $f : A → B$ and a function $g : B → C$, denoted by ◦$g◦f$ or simply $gf$, is defined by ◦$(g◦f)(a) = g(f(a))$. [47]
Example 3.55
Consider again the function $f(x) = x^ 3 + 3$ and $g(x) = 2x^ 2 + x$. Then ◦$g ◦ f (x) = 2(f(x))2 + f(x) = 2x 6 + 13x 3 + 21$.
Lemma 3.14 Associativity
Function composition is associative, i.e.,◦◦◦◦$(h ◦ g) ◦ f = h ◦ (g ◦ f)$.
Proof: This is a direct consequence of the fact that relation composition is associative (see Lemma 3.7). $\Box$
Countability is an important concept in Computer Science. A set that is countable can be enumerated by a program (even though this would take unbounded time), while an uncountable set can, in principle, not be enumerated.
Definition 3.42.
Two sets $A$ and $B$equinumerous[48], denoted $A ∼ B$, if there exists a bijection $A → B$.
The set $B$dominates the set $A$, denoted $A ⪯ B$, if $A ∼ C$ for some subset $C ⊆ B$ or, equivalently, if there exists an injective function $A → B$.
A set $A$ is called countable[49] if $A ⪯ ℕ$, and uncountable[50] otherwise.[51]
Example 3.56
The set $ℤ = \{…, −2, −1, 0, 1, 2, …\}$ of integers is countable, and $ℤ ∼ ℕ$. A bijection $f : ℕ → ℤ$ is given by $f(n) = (−1)^n⌈n/2⌉$.
Lemma 3.15
The relation$∼$is an equivalence relation.
The relation$\preceq$is transitive:$A ⪯ B ∧ B ⪯ C \implies A ⪯ C$.
$A ⊆ B \implies A ⪯ B$.
Proof:[52] (i): Assume $A ∼ B$ and $B ∼ C$, i.e., there exist bijections $f : A → B$ and $g : B → C$. Then ◦$g ◦ f$ is a bijection $A → C$ and hence we have $A ∼ C$.
(ii): If there is an injection from $A$ to $B$ and also an injection from $B$ to $C$, then their composition is an injection from $A$ to $C$. (We omit the proof of this statement.)
(iii): If $A ⊆ B$, then the identity function on $A$ is an injection from $A$ to $B$. $\Box$
A non-trivial theorem, called the Bernstein-Schröder theorem, is stated without proof.[53] It is not needed in this course.
For finite sets $A$ and $B$, we have $A ∼ B$ if and only if $|A| = |B|$. A finite set has never the same cardinality as one of its proper subsets. Somewhat surprisingly, for infinite sets this is possible.
Example 3.57
Let $\bf{O} = \{1, 3, 5, . . .\}$ be the set of odd natural numbers. Of course, $\bf O$ is countable since the identity function is a (trivial) injection from $\bf O$ to $ℕ$. Actually, there is even a bijection$f : ℕ → \bf O$, namely $f(n) = 2n + 1$. Indeed, Theorem 3.17 below states a more general fact.
Theorem 3.17
A set$A$is countable if and only if it is finite or if$A ∼ ℕ$.
The theorem can be restated as follows: There is no cardinality level between finite and countably infinite.
In the proof we make use (without proof) of the fact that every set of natural number has a least element. This fact is called the well-ordering principle of N.
Proof: A statement of the form "if and only if" has two directions. To prove the direction $\impliedby$, note that if $A$ is finite, then it is countable, and also if $A ∼ ℕ$, then $A$ is countable.
To prove the other direction ($\implies$), we prove that if $A$ is countable and infinite, then $A ∼ ℕ$. According to the definition, $A ⪯ ℕ$ means that there is a bijection $f : A → C$ for a set $C ⊆ ℕ$. For any infinite subset of $ℕ$, say $C$, one can define a bijection $g : C → ℕ$ by counting the elements of $C$ one by one.
More precisely, we define the bijection as follows. According to the wellordering principle, there exists a least element of $C$, say $c_0$. Define $g(c_0) = 0$. Define $C_1 = C \backslash \{c_0\}$. Again, according to the well-ordering principle, there exists a least element of $C_1$, say $c_1$. Define $g(c_1) = 1$. This process can be continued, defining inductively a bijection $g : C → ℕ$. Now ◦$g ◦ f$ is a bijection $A → ℕ$, which proves $A ∼ ℕ$. $\Box$
Proof: We could give an enumeration of the set $\{0, 1\}^*$ , i.e., a bijection between $\{0, 1\}^*$ and $ℕ$, but to prove the theorem it suffices to provide an injection $\{0, 1\}^* → ℕ$, which we define as follows. We put a "$1$" at the beginning of the string and then interpret it as an natural number using the usual binary representation of the natural numbers. For example, the string $0010$ is mapped to the number $18$.[55]$\Box$
Theorem 3.19
The set$ℕ×ℕ (= ℕ^2)$of ordered pairs of natural numbers is countable.
Proof: A possible bijection $f : ℕ → ℕ^2$ is given by $f(n) = (k, m)$, where $k$ and $m$ are determined using the following equations: $k + m = t − 1$ and $m = n − {t \choose 2}$ , where $t > 0$ is the smallest integer such that ${t+1 \choose 2} > n$. This corresponds to the enumeration of the pairs $(k, m)$ along diagonals with constant sum $k + m$. More concretely, we enumerate the pairs as follows: $(0, 0), (1, 0), (0, 1),(2, 0),(1, 1),(0, 2),(3, 0), (2, 1), (1, 2), (0, 3), (4, 0), (3, 1),…$. It is easy to see that this is a bijection $ℕ → ℕ^2$ , hence $ℕ^2$ is countable. $\Box$
An alternative proof works as follows. We have $ℕ ∼ \{0, 1\}^*$ and hence $ℕ × ℕ ∼ \{0, 1\}^* × \{0, 1\}^*$. Therefore it suffices to demonstrate an injection
Note that concatenation of bit-strings, denoted as $||$, and is a function $\{0, 1\}^* × \{0, 1\}^* → \{0, 1\}^*$, but the function $(a, b) \mapsto a||b$ is not a injection $\{0, 1\}^* × \{0, 1\}^* → \{0, 1\}^*$ because a bit-string $c$ can be split in many ways into two bit-strings $a$ and $b$ such that $c = a||b$. One possibility to define an injection $\{0, 1\}^* × \{0, 1\}^* → \{0, 1\}^*$ is as follows:
$(a,b)\;\mapsto\;0^{|a|}||1||a||b$,
i.e., we first encode the length $|a|$ of $a$ by a unary encoding, append $1$ to mark the end of this encoding of $|a|$, and then append $a$ and $b$.
Corollary 3.20
The Cartesian product$A× B$of two countable sets$A$ and $B$is countable, i.e.,$A ⪯ ℕ ∧ B ⪯ ℕ \implies A × B ⪯ ℕ$.
Proof: We first prove $A×B ⪯ ℕ×ℕ$ by exhibiting an injection $g : A×B → ℕ×ℕ$, namely $g(a, b) = (f_1(a), f_2(b))$. That $g$ is an injection can be proved as follows:
$$\begin{align*} (a, b) \neq (a', b') &\dot\implies a \neq a' \lor b \neq b' && \text{(definition of pairs)} \\ &\dot\implies f_1(a) \neq f_1(a') \lor f_2(b) \neq f_2(b') && \text{(} f_1 \text{ and } f_2 \text{ are injections)} \\ &\dot\implies (f_1(a), f_2(b)) \neq (f_1(a'), f_2(b')) && \text{(definition of pairs)} \end{align*} $$
Using $A × B ⪯ ℕ × ℕ$ (just proved) and $ℕ × ℕ ⪯ ℕ$ (Theorem 3.19) now gives $A × B ⪯ ℕ$ because is transitive (Lemma 3.15(i)). $\Box$
Corollary 3.21
The rational numbers$ℚ$are countable.
Proof: Every rational number can be represented uniquely as a pair $(m, n)$ where $m ∈ ℤ$, $n ∈ ℕ \backslash \{0\}$, and where $m$ and $n$ are relatively prime. Hence $ℚ ⪯ ℤ × ℕ$. According to Example 3.56, $ℤ$ is countable, i.e., $ℤ ⪯ ℕ$. Thus, according to Corollary 3.20, $ℤ × ℕ ⪯ ℕ$. Hence, using transitivity of , we have $ℚ ⪯ ℕ$ (i.e., $ℚ$ is countable). $\Box$
The next theorem provides some other important sets that are countable.
Theorem 3.22
Let $A$ and $A_i$ for $i ∈ ℕ$be countable sets.
For any$n ∈ ℕ$, the set$A^n$of $n$-tuples over$A$is countable.
The union$∪_{i∈ℕ}A_i$of a countable list$A_0, A_1, A_2, …$of countable sets is countable.
The set$A^*$of finite sequences of elements from$A$is countable.
Proof: Statement (i) can be proved by induction. The (trivial) induction basis is that $A_1 = A$ is countable. The induction step shows that if $A^n$ is countable, then also $A^{n+1} ∼ A^n ×A$ is countable. This follows from Corollary 3.20 because both $A^n$ and $A$ are countable.
We omit the proof of (ii).
We now prove (iii), which implies (i), and hence gives an alternative proof for (i). We define an injection $A^* → \{0, 1\}^*$. This is achieved by using an arbitrary injection $f : A → \{0, 1\}^*$ and defining the natural injection $g : A^* → (\{0, 1\}^*)^*$ as follows: For a sequence of length $n$ in $A^*$, say $(a_1, …, a_n)$, we let
i.e., each element in the sequence is mapped separately using $f$. Now it only remains to demonstrate an injection
$(\{0,1\}^{*})^{*}\to\{0,1\}^{*}$,
which can be achieved as follows.[56] We replace every $0$-bit in a sequence by $00$ and every $1$-bit by $01$, which defines a (length-doubling) injection $\{0, 1\}^* → \{0, 1\}^*$. Then we concatenate all obtained expanded sequences, always separated by $11$. This is an injection because the separator symbols $11$ can be detected and removed and the extra $0$'s can also be removed. Hence a given sequence can be uniquely decomposed into the component sequences, and hence no two different sequences of binary (component) sequences can result in the same concatenated sequence. $\Box$
Example 3.58
We illustrate the above injection $(\{0, 1\}^*)^* → \{0, 1\}^*$ by an example. Consider the sequence $(0100, 10111, 01, 1)$ of bit-sequences. Now $0100$ is mapped to $00010000$, $10111$ is mapped to $0100010101$, etc. and the final concatenated sequence (with separators $11$) is
$000100001101000101011100011101$,
which can uniquely be decomposed into the original four sequences.
We now consider semi-infinite binary sequences $(s_0, s_1, s_2, s_3, …)$. One can interpret such a binary sequence as specifying a subset of $ℕ$: If $s_i = 1$, then $i$ is in the set, otherwise it is not. Equivalently, we can understand a semiinfinite sequence $(s_0, s_1, s_2, s_3, …)$ as a function $ℕ → \{0, 1\}$, i.e., as a predicate on $ℕ$. For example, the primality predicate $\mathtt{prime} : ℕ → \{0, 1\}$ (where $\mathtt{prime}(n) = 1$ if and only if $n$ is prime) corresponds to the semi-infinite sequence $001101010001010001010001000001010000001…$.
Definition 3.43
Let $\{0, 1\}^∞$ denote the set of semi-infinite binary sequences or, equivalently, the set of functions $ℕ → \{0, 1\}$.
Theorem 3.23
Theorem 3.23.The set$\{0, 1\}^∞$is uncountable.
Proof: This is a proof by contradiction. To arrive at a contradiction, assume that a bijection $f : ℕ → \{0, 1\}^∞$ exists.[57] Let $β_{i,j}$ be the $j$th bit in the $i$-th sequence $f(i)$, where for convenience we begin numbering the bits with $j = 0$:
Obviously, $α ∈ \{0, 1\}^∞$, but there is no $n ∈ ℕ$ such that $α = f(n)$ since $α$ is constructed so as to disagree in at least one bit (actually the $n$-th bit) with every sequence $f(n)$ for $n ∈ ℕ$. This shows that $f$ cannot be a bijection, which concludes the proof. $\Box$
This proof technique is known as Cantor's diagonalization argument; it has also other applications.
By interpreting the elements of $\{0, 1\}^∞$ as the binary expansion of a real number in the interval $[0, 1]$, and vice versa, one can show that the interval $[0, 1]$ (and hence $ℝ$ itself), is uncountable.[58]
The above theorem states that there are uncountably many functions $ℕ → \{0, 1\}$. On the other hand, every computer program, regardless of the programming language it is written in, corresponds to a finite string of symbols. Without loss of generality, one can think of a program as a finite binary sequence $p ∈ \{0, 1\}^*$). Hence the set of programs is countable, whereas the set of functions $ℕ → \{0, 1\}$ is uncountable. If every program computes at most one function, there must be functions $ℕ → \{0, 1\}$ not computed by a program. This for Computer Science fundamental consequence of Theorem 3.23 is stated below.
Definition 3.44
A function $f : N → \{0, 1\}$ is called computable if there is a program that, for every $n ∈ ℕ$, when given $n$ as input, outputs $f(n)$.
Corollary 3.24
There are uncomputable functions$ℕ → \{0, 1\}$.
In fact, essentially all such functions are uncomputable. Those that are computable are rare exceptions. For example, the function $\mathtt{prime} : ℕ → \{0, 1\}$ is computable.
Is there a specific uncomputable function? A prominent example is the socalled Halting problem defined as follows: Given as input a program (encoded as a bit-string or natural number) together with an input (to the program), determine whether the program will eventually stop (function value $1$) or loop forever (function value $0$) on that input. This function is uncomputable. This is usually stated as: The Halting problem is undecidable.
This theorem can also be proved by a diagonalization argument similar to the one above. The theorem has far-reaching consequences in theoretical and practical Computer Science. It implies, for example, that it is impossible to write a program that can verify (in the most general case) whether a given program satisfies its specification, or whether a given program contains malicious parts.
In fact, almost all of mathematics is based on the notion of sets. ↩︎
Russell was a very remarkable person. He was not only an outstanding philosopher and mathematician, but also politically active as a pacifist. Because of his protests against World War I he was dismissed from his position at Trinity College in Cambridge and imprisoned for 6 months. In 1961, at the age of 89, he was arrested again for his protests against nuclear armament. In 1950 he received the Nobel Prize for literature. ↩︎
In fact, in Zermelo-Fraenkel (ZF) set theory, the axioms exclude that a set can be an element of itself. ↩︎
Indeed, mathematicians are still working on fundamental questions regarding the theory of sets (but not questions relevant to us). ↩︎
For example, the set containing exactly the three natural numbers $1$, $2$, and $3$ has many different descriptions, including $\{1, 2, 3\}$, $\{3, 1, 2\}$, $\{1, 1 + 1, 1 + 1 + 1\}$, etc. All these descriptions refer to the same set. ↩︎
In axiomatic set theory this is guaranteed by appropriate axioms. ↩︎
We briefly address this question, although we will not make use of this later and will continue to think about ordered pairs and lists in a conventional sense and with conventional notation. ↩︎
Here we use $C$ and $D$ rather than $A$ and $B$ to avoid confusion because $A$ and $B$ are used here to denotes sets. ↩︎
We take it for granted that ∅ is actually a set. But in an axiomatic treatment of set theory, this must be stated as an axiom. ↩︎
In axiomatic set theory, the existence of the power set of every set must be postulated as an axiom. ↩︎
Note that the relation takes can change over time, and in such an example we consider the relation at a certain point in time. ↩︎
The identity relation ($=$) on any finite set corresponds to the identity matrix. ↩︎
The justifications should be obvious, except perhaps for the following fact from predicate logic (explained in Chapter 6) used several times in the proof: $∃x(F ∧G) ≡ F ∧ ∃xG$ if $x$ does not appear in $F$. ↩︎
Note that the notation $\phi^2$ is actually ambiguous; it could also denote the Cartesian product $\phi × \phi$. But in these lecture notes no ambiguity will arise. ↩︎
If the matrices are considered as Boolean matrices, then for multiplying two matrices one takes the OR of all product terms contributing to an entry in the product matrix. ↩︎
Note that irreflexive is not the negation of reflexive, i.e., a relation that is not reflexive is not necessarily irreflexive. ↩︎
In set-theoretic notation: $(a, c) ∈ ρ^2 ∧ ρ^2 ⊆ ρ \implies (a, c) ∈ ρ$. ↩︎
Such a relation will be defined below as a partial order relation. ↩︎
When the relation $θ$ is understood, we can drop the subscript $θ$. ↩︎
This is a more fancy way of saying that two rational numbers $a/b$ and $c/d$ are the same number if and only if the ratio is the same. But actually, this is the definition of the rational numbers. If the reader is surprised, he or she is challenged to come up with a simpler definition. ↩︎
The term "cover" is used here in a physical sense, not in the sense of Definition 3.26.↩︎
Recall that for a partial order $\preceq$ we can define the relation $≺$ as $a ≺ b \stackrel{\text{def}}{\iff} a ⪯ b ∧ a ≠ b$. ↩︎
The relations $⪰$ and $≻$ are defined naturally by $a ⪰ b \stackrel{\text{def}}{\iff} b ⪯ a$ and $a ≻ b \stackrel{\text{def}}{\iff} b ≺ a$. ↩︎
Note that a least or a greatest element need not exist. However, there can be at most one least element, as suggested by the word "the" in the definition. This follows directly from the antisymmetry of $\preceq$. If there were two least elements, they would be mutually comparable, and hence must be equal. ↩︎
Note that the definitions of the least element and of a lower bound differ only in that a lower bound can be outside of the considered subset $S$ (and therefore need not be unique). ↩︎
Note that for a poset $(A;⪯)$ and a subset $S ⊆ A$, restricting to $S$ results in a poset $(S;⪯)$. ↩︎
Note that the composition of functions is the same as the composition of relations. However, unfortunately, different notation is used: The composition of relations $f$ and $g$ is denoted ◦$f ◦g$ while, if considered as functions, the same resulting composition is denoted as ◦$g ◦ f$. (The reason is that one thinks of functions as mapping "from right to left".) Because of this ambiguity one must make explicit whether the symbol ◦$◦$ refers to function or relation composition. ↩︎
Note that without prepending a $1$, different strings (e.g. $0010$ and $00010$) would result in the same integer and hence the mapping would not be an injection. ↩︎
Note that a simple concatenation of the sequences does not work because the concatenated sequences can not uniquely be decomposed into the original sequences, i.e., this is not an injection. ↩︎
Here we make use of Theorem 3.17 which implies that $\{0, 1\}^∞$ is countable if and only if such a bijection exists. ↩︎
A subtlety, which is not a problem in the proof, is that some real numbers have two representations as bit-strings. For example, the number $0.5$ has representations $10000000\cdots$ and $0111111\cdots$. ↩︎