Cantor set

The Cantor ternary set is a remarkable subset of the real numbers named after German mathematician Georg Cantor who described the set in 1883. It has the same cardinality as \(\mathbb{R}\), yet it has zero Lebesgue measure.

The set can be constructed recursively by first considering the unit interval \(C_{0}=[0,1]\). In the next step, we divide the set into three equal parts and discard the open middle set. This leads to the new set \(C_{1}=[0,\tfrac{1}{3}]\cup[\tfrac{2}{3},1]\). This procedure is repeated on the two remaining subintervals \([0,\tfrac{1}{3}]\) and \([\tfrac{2}{3},1]\), and iteratively on all remaining subsets such that \(C_n = \tfrac{1}{3}C_{n-1} \cup (\tfrac{2}{3} + \tfrac{1}{3}C_{n-1})\). The Cantor set is defined by the limit as \(n\to\infty\), i.e., \(\mathcal{C} = \cap_{n=0}^{\infty} C_{n}\).

The recursion can be illustrated in python in the following way

  import numpy as np

  def transform_interval(x, scale=1.0, translation=0.0):
      return tuple(map(lambda z: z * scale + translation, x))

  def Cantor(n):
      if n==0: return {(0,1)}
      Cleft = set(map(lambda x: transform_interval(x, scale=1/3), Cantor(n-1)))
      Cright = set(map(lambda x: transform_interval(x, translation=2/3), Cleft))
      return Cleft.union(Cright)
  print(Cantor(1))
{(0.6666666666666666, 1.0), (0.0, 0.3333333333333333)}

The following code displays the first 5 iterations of the algorithm and clearly illustrates the self-similarity property in the sets \(C_{n}\).

  def intervalprint(x, n=240):
     sym = n*[' ']
     val = np.linspace(0, 1, num=n)
     for interval in x:
	idx = np.where((val >= interval[0]) & (val <= interval[1]))[0]
	for i in idx:
           sym[i] = u'\u2500'
     st = ''.join(sym)
     print(st)

  x = list(map(Cantor, [0,1,2,3,4,5]))
  for i in range(0, len(x)): intervalprint(x[i])
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
────────────────────────────────────────────────────────────────────────────────                                                                                ────────────────────────────────────────────────────────────────────────────────
───────────────────────────                           ──────────────────────────                                                                                ──────────────────────────                           ───────────────────────────
─────────         ─────────                           ────────         ─────────                                                                                ─────────         ────────                           ─────────         ─────────
───   ───         ───   ───                           ───   ──         ───   ───                                                                                ───   ───         ──   ───                           ───   ───         ───   ───
─ ─   ─ ─         ─ ─   ─ ─                           ─ ─    ─         ─ ─   ─ ─                                                                                ─ ─   ─ ─         ─    ─ ─                           ─ ─   ─ ─         ─ ─   ─ ─

It is clear that \(\mathcal{C}\) is bounded (being a subset of the unit interval) and closed since it constructed by removing countably many open intervals from a closed interval. It also has zero measure, meaning that it does not contain any intervals.

To see this, note that in the first step the open middle set \((\tfrac{1}{3}, \tfrac{2}{3})\) is removed which has measure 1/3. From the remaining two closed sets, \([0, \tfrac{1}{3}]\) and \([\tfrac{2}{3}, 1]\), the open middle sets are removed in the next iteration: \((\tfrac{1}{3^{2}}, \tfrac{2}{3^{2}})\) and \((\tfrac{2}{3}+\tfrac{1}{3^{2}}, \tfrac{2}{3}+\tfrac{2}{3^{2}})\), each of measure \(\tfrac{1}{3^2}\). In the \(n\)th iteration \(2^n-1\) open intervals are removed each of length \(\tfrac{1}{3^{n}}.\) Hence, the Lebesgue measure of the Cantor set is \[m(\mathcal{C}) = m([0,1])-\sum_{k=0}^{\infty} \tfrac{2^{n}}{3^{n+1}} = 1-\frac{1}{3}\sum_{k=1}^{\infty} \left(\tfrac{2}{3}\right)^{n} = 1 - \frac{1}{3}(1-\tfrac{2}{3})^{-1} = 0. \]

In this sense, we can think of the Cantor set just containing “dust”.

It can also be shown that the Cantor set can be represented as \[ C = \left\{ \sum_{i=1}^{\infty} \frac{x_{n}}{3^{n}} \mid x_{n}\in{0,2}\right\}, \] from which it again follows that the set is uncountable. This can also be seen by considering an arbitrary countable subset \[N = \{n_{1}, n_{2}, \ldots \}\subseteq \mathcal{C}.\] In the construction of \(\mathcal{C}\), we must have that \(n_{1}\) belongs to one of the two closed intervals in \(C_{1}\). Hence, there is one interval \(I_{1}\) of length \(\tfrac{1}{3}\) such that \(n_{1}\not\in I_{1}\). Similarly, \(C_{2}\cap I_{1}\) is the union of two new closed intervals of length \(\tfrac{1}{3^{2}}\), and \(n_{2}\) can at most be in one of these two sets. Let \(I_{2}\subset I_{1}\) be this interval s.t. \(n_{2}\not\in I_{2}\). Continuing in this fashion we construct a series of intervals \(\mathcal{C}\supset I_{1}\supset I_{2}\supset \cdots \supset I_{k} \supset\cdots\), where \(I_{k}\) is one of the closed intervals of \(C_{n}\) of length \(\tfrac{1}{3^{n}}\) such that \(n_k\not\in I_{k}\). The limit \(I_{\infty} = \cap_{k=1}^{\infty}I_{k}\) is non-empty (see below), and by construction \(N\cap I_{\infty} = \emptyset\). Hence given any arbitrary countable subset of \(\mathcal{C}\) we can still find elements in \(\mathcal{C}\) that does not belong to this subset, and thus there cannot exist a bijection between \(\mathbb{N}\) and \(\mathcal{C}\) and we conclude that \(\mathcal{C}\) is uncountable infinite.

Here we used Cantor’s Intersection Theorem: Let \((X, d)\) be a complete metric space. Let \((I_{k})_{k\in N}\) be a sequence of decreasing non-empty, closed subsets of \(X\) then \(\bigcap_{k\in N}I_{k} \neq \emptyset.\)

Proof: For each \(k\in N\) choose \(x_{k}\in I_{k}\). The diameter of \(I_{k}\) is defined as \(\operatorname{diam}(I_{k})=\sup\{𝑑(𝑥,𝑦):𝑥,𝑦∈ I_{k}\}\). We only need to consider the case where \(\operatorname{diam}(I_{k})\to 0.\) In this case, let \(\epsilon>0\) then there exists \(N>0\) such that \(d(x, x’)<\epsilon\) for all \(x,x’\in I_{N}\). Let \(m,n>N\) then \(x_{n}, x_{m}\in I_{N}\) since \(I_N \supset I_{N+1}\supset … \supset I_{m},I_{n}\), and thus \(d(x_{n},x_{m})<\epsilon\). We conclude that \((x_{k})_{k\in\mathbb{N}}\) is a Cauchy sequence. Since \(X\) is complete \(x_n\to x\) as \(n\to\infty\) for some \(x\in X\). Note that \((x_{n})_{n=N}^{\infty} \subset I_{N}\) and since \(I_{N}\) is closed, the limit point also lies in the set \(x\in I_{N}\). As this holds for any \(N\), we conclude that \(x\in\bigcap_{k\in N}I_{k}\). This concludes the proof. Note, that in this case the limit point \(x\) is in fact the only element. Since if we let \(x’\) be another element then \(x,x’\in I_n\) for all \(n\in N\) but \(d(x,x’)\leq\operatorname{diam}(F_{n})\to 0\) as \(n\to\infty\). Hence \(d(x,x’)=0\), and therefore \(\bigcap_{k\in N}I_{k} = \{x\}.\)