Principia Mathematica is a math textbook by A.N. Whitehead and Bertrand Russell which derives various mathematical truths from the laws of (higher-order) logic and a small number of axioms. It is "famous" for taking hundreds of pages to get around to proving that 1 + 1 = 2, and although they could have done it faster by sacrificing some generality, so far it seems like most of what's on the way there is required for that proof to be useful.

Since this is supposed to be a programming blog and I am reading it for reasons distantly related to programming I will be focusing on the elements I think might be relevant to other programmers (although probably not very many other programmers). Hopefully I'll have some actual programming stuff up soon.

The universe

Right now I am trying to develop a very metaprogramming-focused programming language. This involves a lot of type stuff so I will introduce the notation x: T to mean "x has type T", and G<A> to mean a generic type G with argument A.

In principle the arguments to generic types can be types or constant values. We might imagine a type like FixedSizeList:

struct FixedSizeList<T: type, size: int> {
    // ...
}

In most programming languages, generics are mostly intended to take types as arguments, and constant arguments get treated very differently from type arguments. But it's not obvious why this should be the case: it seems like we could declare that type: type, and then go around treating types like any other (compile-time) object. However, having such a "type of all types" causes various practical difficulties, some of which do seem to be genuinely related to the notorious "set of all sets" you've probably heard about if you've ever heard about set theory. There are languages (like Python) where the type is its own type but they tend not to allow complex reasoning about types the way e.g. Haskell does.

This "set of all sets" is called the "universal set" or V (for Vniverse?), and the usual verdict is that it doesn't exist because it can't exist: its definition is contradictory. This is true in Zermelo-Fraenkel set theory, which is the most popular version of set theory, but the proof turns out to rely on surprisingly complicated assumptions and there are theories like New Foundations where V is a perfectly good set. Principia Mathematica, which uses the term "class" instead of "set", actually defines the class Cls containing all classes, and checks whether x ∈ Cls, rather than directly defining a way to check whether x is a class or not!

Technically there are some important differences between mathematical sets and the types used in programming; most importantly the paradoxes that lead to ruinous contradictions in set theory tend to lead to relatively harmless infinite loops when you try to put them in program form. But I really don't want to spend months implementing some fancy feature only to find out it hangs the compiler every time you try to use it. So I want to learn how these weird universal types and sets work before I try.

Predicativity

Another PM topic is its predicativity, which is basically an effort to avoid self-reference. A function is predicative if it doesn't directly or indirectly reference itself. Avoiding direct self-reference is easy. Indirect reference is harder; if f refers to g and g refers to f, then they're both impredicative. The possibility of indirect reference creates an annoying hierarchy of function orders, where a higher-order function may refer to a lower-order function and stay predicative.

This hierarchy is actually such a problem that PM basically collapses it by introducing the Axiom of Reducibility, which says that for every function, there's a predicative function with matching outputs for all inputs. The Axiom of Reducibility was subject to some harsh criticism but is in my opinion at least very interesting. Whether it's one of the laws of logic or not, it seems like it might be true of the functions a compiler deals with, since we can do things like replace recursion with loops.

In PM, pretty much the only way for indirect self-reference to arise is through the quantifiers, which most programming languages don't have. However, I think all this stuff might be useful for coming up with rules for cyclic dependencies. A particular case I've worried about before: imagine modules can take arguments, and are allowed to import each other. How much stuff can be a module argument?

Other stuff

There are a lot of small things that would demand overly long descriptions compared to how interesting they are.

Contextual definitions: PM often defines sentences containing terms instead of defining terms directly. Like an early version of "syntactic sugar". I think this is a good approach to things prone to "undefined behavior"; it gives precision about what is and is not defined.
- Particularly amusing is the definite description operator, which defines a logical equivalent of the word "the" as seen in "the sine of x" or "the maximum element of a set".
Typical ambiguity: statements in PM don't mention the type of objects they are about. They are actually generic: what is true of individual objects is true of functions and functions of functions and so on. This kind of implicit generic-ness is interesting but seems hard to make applicable.
Relation numbers allow arithmetic on functions of type (T, T) -> bool. There might be some good higher-order functions to be found by treating functions as numbers. Somehow the real numbers and ordinals are all relation numbers.
The Axiom of Infinity and Axiom of Choice aren't really relevant to programming, but take on very weird forms that are certainly philosophically interesting.

My reading experience so far

The book got a second edition with an introduction that basically does a rouch sketch of a whole new theory, without the Axiom of Reducibility. I read this first because it was the introduction, but it constantly references things defined later; it should be at the end of the book. The new theory cannot get to the real numbers and might be too hardcore for its own good.

Following that, we get the real beginning. I had hoped that, as a programmer quite familiar with logic, I could skip the first hundred+ pages, which mainly concern very simple ideas like contraposition, but it turns out that the authors wantonly introduce often-used abbreviations throughout those chapters and I ended up having to go back and read them after struggling a bit with some of the stuff about classes. Now I am reading through the chapter about classes a second time.

The going is slow. The book is dense and the prose not always clear. Doing the derivations by myself has helped. I end up doing about half a page of writing for each page of reading.

I might be finished with Volume I (of 3) in a few weeks.