Categories: the basics, towards (co)monads (Part 1)

As the millionth person to write about monads, I suppose I must claim that all other monad tutorials, including their burrito analogies,1 are woefully lacking in substance! By the end of the post, you will hopefully have a better understanding of a monad from a more mathematical viewpoint, that is, recognize natural appearances of monadic patterns and use cases, and use your newfound pure mathematical skill to cut straight to the heart of the problem at hand.2

A monad is a specific kind of computation. They abstract the same basic structures across various types. The dual concept, comonads, are less commonly seen, but they are still useful.

We do not wish to simply glance past the concept of monads and be left head-scratching days later. To understand monads, we need to “briefly” go over

  1. categories,
  2. functors,
  3. natural transformations,
  4. some universal constructions,
  5. monoids, monoidal categories and monoid objects,
  6. and cartesian closed categories and the internal hom.

Then, we’ll finally be able to cover

  1. monads,
  2. comonads,
  3. and the reader/writer examples.

This post (part 1) only covers the first three items in the first list, but it does so comprehensively. These three concepts are crucial to understand, and form the basis of category theory. Saunders Mac Lane, one of the cofounders of category theory, wrote the following in his textbook Categories for the Working Mathematician: “I didn’t invent categories to study functors; I invented them to study natural transformations.”

Feel free to skim or skip sections covering concepts you are familiar with. We’re not going to rigorously prove everything, but we won’t shy away from detail when it’s needed. Let’s begin!

1 Categories

Categories are made of two kinds of things: objects and morphisms (also called arrows or maps). In $\text{Set}$, the category of sets, the objects are sets and the morphisms are functions between them, so use this as a kind of analogy in your head.

Every morphism $f$ of a category $C$ has a source object $A$ and a target $B$ where $A$ and $B$ are objects of $C$, which we write $f\colon A\to B$ and say “$f$ is a morphism from $A$ to $B$. The set3, or hom-set, of morphisms from $A$ to $B$ is denoted $\text{Hom}(A, B)$, $\text{Hom}_C(A, B)$, or just $C(A, B)$. For every pair of morphisms $f\colon X\to Y$ and $g\colon Y\to Z$, we may compose them to form $g\circ f\colon X\to Z$, also written $gf$. Additionally, we require the following axioms to hold:

  • Composition must be associative. That is, $(h\circ g)\circ f=h\circ (g\circ f)$ for all $f\colon W\to X$, $g\colon X\to Y$, and $f\colon Y\to Z$.
  • For every object $A$, there is a morphism $1_X$ (or $id_X$), which is an identity for composition. That is, for all $f\colon A\to X$ and $g\colon X\to A$ we have $1_X\circ f=f$ and $g\circ 1_X=g$. (It is simple to prove that this morphism is unique.)

This should remind you of the domain, codomain, and composition of functions. To keep track of equations involving composition, we may use commutative diagrams, where any two paths along a chain of arrows from one object to another yields an equality. The following diagram says that $h=g\circ f$:

$$ \begin{array}{ccc} A & \stackrel{f}{\to} & B \\ {} & \mathllap{\scriptsize{h}}{\searrow} & \darr\scriptsize{g} \\ {} & {} & C \end{array} $$

Wow, that was a mouthful. Let’s walk through a few examples:

  • In $\text{Set}$, the category of sets (as mentioned earlier), the objects are sets and the morphisms are functions between sets. We know function composition is associative and the existence of an idenitity function for each set that maps every element to itself.
  • In $\text{Grp}$, the category of groups, the objects are groups and the morphisms are group homomorphisms. Since we can define groups and group homomorphisms in terms of sets and functions, we can reuse the above argument to deduce that they satisfy the category axioms.
  • In $\text{Vect}_\R$, the category of real vector spaces, the objects are vector spaces over the reals and the morphisms are linear transformations (matrix-vector multiplication). Vector spaces can be seen as sets and linear transformations can be seen as functions, blah blah blah, they satisfy the axioms.

You might notice a pattern here. That’s because all these categories are concretizable categories, which basically mean that the objects and morphisms can be seen as sets and functions with extra structure on top. Most interesting categories are concretizable,4 although there are exceptions, but we won’t deal with those here.

The general idea behind many categories is that it allows us to study certain kinds of objects by understanding how they relate to one another with structure-preserving maps. In $\text{Grp}$ and $\text{Vect}_\reals$, the axioms (and therefore properties) of groups and vector spaces are preserved by morphisms. For example, linear transformations preserve vector scaling and addition: $F(au+bv)=aF(u)+bF(v)$, where $a$ and $b$ are scalars, $u$ and $v$ are vectors, and $F$ is a linear transformation. The “structure” of sets is quite general, and from the categorical viewpoint of functions between sets, two sets behave the same if they are the same cardinality (size). (This viewpoint is called structuralism.) Similarly, two finite dimensional vector spaces with the same dimension share similar behavior. We call two objects which essentially behave the same isomorphic. Two objects are isomorphic if there exists an isomorphism between them. We say $f\colon A\to B$ is an isomorphism if there exists $g\colon B\to A$ such that $gf=1_A$ and $fg=1_B$. You can think of an isomorphism as an invertible map.

Additionally, each category $C$ has an opposite $C^{\text{op}}$, which you get by turning all the arrows around. For example, the morphism $f\colon X\to Y$ is actually $f^\text{op}\colon Y\to X$ in the opposite category — the objects are the same, but the arrows are backwards, which is achieved by formally swapping the source and target objects (essentially switching their names). The composite $g\circ f\colon X\to Z$ where $g\colon Y\to Z$ is actually $f^\text{op}\circ^\text{op} g^\text{op}\colon Z\to X$ in the opposite category — the order of composition is simply reversed to account for the reversal of arrows. There’s often no good “meaningful” way to understand something like $\text{Set}^\text{op}$, but opposite categories turn out to be quite useful anyway.

A natural question to ask is: what map preserves the structure of categories?

2 Functors

Functors are maps between categories that preserve the structure of a category. But what does that actually mean? Generally when we define mathematical objects, we outline a structure and some axioms it must obey. So, a functor $F\colon C\to D$ between the categories $C$ and $D$ must map

  • objects $X$ to objects $F(X)$, also written $FX$,
  • morphisms $f\colon X\to Y$ to morphisms $Ff\colon FX\to FY$ (preserving the source and target object),
  • identity morphisms on each object to identity morphisms on the corresponding object $F(1_X)=1_{F(X)}$,
  • and composition of morphisms: $F(g\circ f)=F(g)\circ F(f)$. Since equations may be represented by commutative diagrams, we say that functors preserve commutative diagrams.

There are also contravariant functors, which switch the source and target of morphisms as well as the order of composition. In contrast, the “ordinary” functors may be called covariant. Luckily, every contravariant functor $F\colon C\to D$ may instead be viewed as a covariant functor $F\colon C^\text{op}\to D$ or $F\colon C\to D^\text{op}$, using the opposite category construction from above.

Examples!

  • An endofunctor $C\to C$ is a functor from a category $C$ to itself.
  • An identity functor is an endofunctor that sends objects and morphisms to themselves.
  • Two functors $F\colon C\to D, G\colon D\to E$ can be composed to form $G\circ F\colon C\to E$, also written $GF$. (It’s trivial to check that the composite satisfies the functor axioms, i.e. the mapping is functorial. If you’re not sure, do it yourself!)
  • A constant functor $C\to D$ which maps every object of $C$ to a single object in $D$ and every morphism to the identity morphism on that object.
  • A forgetful functor $U\colon C\to \text{Set}$ maps every object of a concrete category to its underlying set, sending morphisms to the underlying functions.
  • A free functor $F\colon \text{Set}\to C$, when it exists, maps each set to the “free object” in $C$ on that set.
    • For instance, when $C$ is $\text{Vect}_\reals$, $F$ maps an $n$-element set to a vector space with $n$ dimensions. Functions to/from the original set can be seen as projections/embeddings to/from the vector space that preserve composition of the functions. (Yeah, it’s a bit weird.)
  • The (covariant) power set functor $P\colon \text{Set}\to\text{Set}$ sends each set to its power set (that is, the set of all subsets) and each function $f\colon X\to Y$ to the direct image function $f_*\colon P(X)\to P(Y)$, which maps every subset $U\subseteq X$ to its image $f_*(U)\subseteq Y$.
  • The contravariant power set functor $P\colon \text{Set}\to\text{Set}^\text{op}$ (different $P$) still sends each set to its power set and each function $f\colon X\to Y$ to the inverse image function $f^*\colon P(Y)\to P(X)$, which maps every subset $V\subseteq Y$ to its preimage $f^*(V)\subseteq X$.
  • If we think of groups as one object categories where all morphisms are isomorphisms (ensuring their invertibility), then a functor from one group to another is a group homomorphism. (If you didn’t know what a group is, now you know!) Additionally, a functor from a group to $\text{Set}$ describes a group action on a particular set, and similarly a functor from the group to any category describes a group action on a particular object.

Notice that you can compose functors, and that the composition has identities, namely, the identity functors. As you might guess, there is a category of small categories, called $\text{Cat}$, whose objects are small categories and morphisms are functors. (The word “small” deals with issues pertaining to size, such as Russell’s paradox and a set of all sets.)

We’re on a roll here! Maps between objects constitute a category, and functors between categories constitute the category of small categories.

… are there maps between functors?

3 Natural transformations

Like how functors are maps between categories, natural transformations are maps between functors. Let $F,G\colon C\to D$ be functors. A natural transformation $\eta\colon F\Rarr G$ is a family of morphisms such that:

  • for every object $X\colon C$, we have a morphism $\eta_X\colon F(X)\to G(X)$ called the component of $\eta$ at $X$,
  • and for all $f\colon X\to Y$ in $C$, $\eta_Y\circ Ff=Gf\circ \eta_X$, that is, the following diagram commutes:

$$ \begin{array}{ccc} FX & \stackrel{\eta_X}{\to} & GX \\ \scriptsize{Ff}\darr & {} & \darr\scriptsize{Gf} \\ FY & \stackrel{\eta_Y}{\to} & GY \\ \end{array} $$

This is getting dangerously abstract — what are we actually saying here?

First of all let’s remind ourselves that the definition of a category does not actually require objects to have any internal structure. As far as category theory is concerned, they don’t. Therefore, the “properties” of an object are simply determined by the morphisms into and out of it.5

Notice how this square commutes for all morphisms in the category $C$, no matter what objects are the source and target. In a sense, $\eta\colon F\Rarr G$ is a kind of map that is independent of the particular object each component is assigned to — instead of being from one object to another, it’s one functor to another. (Remember that the components are in $D$.) Don’t worry if you’re confused, because we have examples!

3.1 Embedding vector spaces into their double dual

Consider the category $\text{Vect}_\reals$, and recall that its morphisms are linear transformations, also known as linear maps. Let $V$ be a real vector space and consider the set of maps from it to the reals,6 $\text{Hom}(V,\reals)$ or $V\to\reals$. By defining the operations of addition and multiplication on it, the hom-set becomes a vector space — the dual vector space, which we denote $V^*$.

Let $a\colon\reals$, $v\colon V$, and $f,g\colon V^*$. Then, in $V^*$,

  • addition is defined as $f+g=(v\mapsto f(v)+g(v))$,7
  • and scalar multiplication is defined as $af=(v\mapsto af(v))$.

This should hopefully convince you $V^*$ is a vector space. If you’re not sure, verify it!

The elements of a dual vector space are called covectors or one-forms. If you choose an inner product (like the standard dot product), you can embed a vector space into its dual space with the map $u\mapsto v\mapsto u\cdot v$. In fact, if the dimension of $V$ is finite, then $V$ and $V^*$ are isomorphic. However, we’re not always so lucky: vector spaces generally do not come with a particular inner product. Every choice of an inner product defines a different map to our dual vector space; there’s no standard/canonical one.

The dual vector space is called “dual” for a reason. It turns out there is a natural embedding of $V$ into its double dual, $V^{**}$, and that this embedding is an isomorphism when $V$ has finite dimension! This map is defined as

$$ \Psi_V\colon V\to ((V\to\reals)\to\reals)=v\mapsto \varphi\mapsto \varphi(v), $$

which takes every vector $v$ to the evaluation map at $v$. The evaluation map simply evaluates a function $f$ at $v$, returning $\varphi(v)$. For a map to be natural, it must somehow be “choice-free” — this usually suggests the map is a component of a natural transformation. For example, to define $\Psi_V\colon V\to V^{**}$, we didn’t have to make any unnatural choices, unlike when we were defining an embedding $V\to V^*$.

Now, we just have to extend $\Psi_V\colon V\to V^{**}$ to a natural transformation $\Psi\colon1_{\text{Vect}_\reals}\Rarr -^{**}$ from the identity functor to the double dual functor, both of which are endofunctors on $\text{Vect}_\reals$. But wait — if we’re upgrading the double dual to a functor, what’s the double dual of a linear map? Don’t worry, it’s rather similar: if we have $f\colon V\to W$, then $f^{**}\colon V^{**}\to W^{**}=\theta\mapsto\varphi\mapsto \theta(\varphi\circ f)$. If you don’t trust me, break out a sheet of paper and try it yourself — similarly, verify the functoriality if you want.8

Now we’re ready. We wish to verify the naturality condition for $\Psi$:

$$ \begin{array}{ccc} X & \stackrel{\Psi_X}{\to} & {X^{**}} \\ \scriptsize{f}\darr & {} & \darr\scriptsize{f^{**}} \\ Y & \stackrel{\Psi_Y}{\to} & {Y^{**}} \\ \end{array} $$

This square tells us that $\Psi_Y\circ f=f^{**}\circ\Psi_X$. Let $x\colon X$, $y\colon Y$, $\varphi\colon X^*$, $\phi\colon Y^*$, and $\theta\colon X^{**}$. Substituting and expanding this out yields

$$ \begin{aligned} &&(y\mapsto\phi\mapsto\phi(y))\circ f&=(\theta\mapsto\phi\mapsto\theta(\phi\circ f))\circ (x\mapsto\varphi\mapsto\varphi(x))\\ =&&x\mapsto\phi\mapsto\phi(f(x))&=x\mapsto\phi\mapsto(\varphi\mapsto\varphi(x))(\phi\circ f)\\ =&&x\mapsto\phi\mapsto\phi(f(x))&=x\mapsto\phi\mapsto(\phi\circ f)(x)\\ \end{aligned} $$

(If you’re not sure what I just did, this is basically lambda calculus with a slightly different syntax. I’ve written a blog post covering the basics!)

You should be able to see that the equation holds, and therefore the transformation is natural! We should remind ourselves that the above commutative diagram holds for any morphism between any objects. This compatibility demonstrates that the natural transformation $\Psi\colon1_{\text{Vect}_\reals}\Rarr -^{**}$ represents a map that is independent of the specific objects it acts on.

In fact, if we restrict $\text{Vect}_\reals$ to $\text{FinVect}_\reals$, the category of finite-dimensional real vector spaces, $\Psi$ becomes a natural isomorphism, which is a natural transformation where every component is an isomorphism. This is a fancy restatement of what we noted earlier: finite-dimensional vector spaces are naturally isomorphic to their double duals.

3.2 The determinant

The determinant of a square matrix, if you aren’t aware, is a number that represents the “magnitude” of the matrix. For example, the determinant of a 3-by-3 matrix is equal to the volume of the unit cube under the image of the matrix, that is, after it is transformed. (You should think of matrices as linear transformations here. A matrix $M$ sends a vector $v$ to $Mv$ under its corresponding linear transformation.)

Before we get to the point, let’s go over a few remarks and definitions.

  • If the determinant has no reciprocal, the matrix is said to be singular. Singular matrices do not preserve uniqueness; two vectors may map to the same thing. Therefore, they are not invertible.
  • Nonsingular matrices are invertible, and their determinants are the reciprocal of the determinant of their inverse.
  • In fact, for two matrices $A,B$,$|A||B|$=$|A||B|$, where $|-|$ is the determinant of a matrix.
    • The above statement follows from this one since $AA^{-1}=A^{-1}A=I$ and $|I|=1$, where $A$ is an invertible matrix and $I$ is the identity matrix.
    • This is actually kind of hard to prove (because I don’t know how). Hopefully the “volume of a unit cube” thing should be enough intuition for you to accept this.
  • A field is a set where we can do addition, multiplication, subtraction, and division (but not by zero), like we can with the real ($\reals$) and complex ($\Complex$) numbers. A commutative ring is a generalization of a field where division is not guaranteed, such as the integers ($\Z$), Gaussian integers ($Z[i]=\{a+bi\mid a,b\in\Z\}$), and polynomials with real coefficients ($\reals[x]$).
  • When a matrix has entries in a field $K$, or more generally, a commutative ring $K$, we say the matrix is over $K$, and that $K$ is the ground field/commutative ring.
  • The elements in a commutative ring which do have multiplicative inverses (reciprocals) are called units. The set of units of a commutative ring $R$ under multiplication form a group $R^\times$ called the group of units. For example, ${\Z^\times=\{1,-1\}}$, ${\Z[i]^\times=\{1,i,-1,-i\}}$, and $\reals^\times=\reals[x]^\times=\reals\backslash\{0\}$, where the group operation is inherited from the parent commutative ring’s multiplication.
  • The set of all invertible n-by-n matrices becomes a group under matrix multiplication (which corresponds to composing linear transformations). We call this group the general linear group $\text{GL}_n(K)$, where $K$ is the ground field/commutative ring.

The third point above notes that we can multiply matrices and take the determinant, or take the determinant and multiply the resulting numbers. This is precisely a group homomorphism $\text{det}_\reals\colon\text{GL}_n(\reals)\Rarr\reals^\times$ when our matrices are over $\reals$. $\text{GL}_n(\reals)$ captures matrix multiplication while $\reals^\times$ captures multiplication in the ground commutative ring.

If you’ve been following along, your natural transformation sense should be tingling, because doesn’t this relationship hold for things that aren’t necessarily the reals? You’re correct — the determinant, as a “general morphism” in the category of groups $\text{Grp}$, is independent of whatever particular commutative ring the matrices are over. In fact, we can define the general linear group $\text{GL}_n(-)$ and the group of units $(-)^\times$ as functors from the category of commutative rings $\text{CRing}$ to $\text{Grp}$, and use this to define the determinant $\text{det}\colon\text{GL}_n(-)\Rarr (-)^\times$ as a natural transformation. Let’s take a look at the naturality square, which holds for any morphism $f$:

$$ \begin{array}{ccc} \text{GL}_n(R) & \stackrel{\text{det}_R}{\to} & R^\times \\ \scriptsize{\text{GL}_n(f)}\darr & {} & \darr\scriptsize{f^\times} \\ \text{GL}_n(S) & \stackrel{\text{det}_S}{\to} & S^\times \\ \end{array} $$

We have three concerns to address: what are $\text{GL}_n(-)$ and $(-)^\times$ applied to a morphism $f$ in $\text{CRing}$, and what does it intuitively mean for this square to commute? Well, the first one is rather simple: using $f$, map each entry in the matrix. The second one is simple, too: since the group of units is a subset of a ring, $f$ can map those elements.9

Like the double dual example, the square commuting simply means that $\text{det}$ is independent of the particular commutative ring the matrices are over. All the possible morphisms $f\colon R\to S$ account for all the properties of all possible objects $R$ and $S$, so the commutative square describes a kind of compatibility between $\text{det}$ and the structure of the category $\text{Grp}$. You can try verifying the naturality condition for the following maps:

  • $a\mapsto a\colon\reals\to\Complex$, the embedding of the reals into the complexes,
  • $p\mapsto p(a)\colon\reals[x]\to\reals$, the polynomial evaluation map for a fixed $a\colon\reals$,
  • and $z\mapsto z^*\colon\Complex\to\Complex$, the complex conjugation map.

Of course, the standard determinant also works for (square) singular matrices, but if we restrict it to invertible matrices, we can understand the determinant as a natural transformation in $\text{Grp}$. I would guess that using the category of monoids $\text{Mon}$ instead of $\text{Grp}$ would capture this missing behavior, but I haven’t checked carefully. (A monoid is a generalization of a group, where elements do not have to be invertible. It can be viewed as a one object category!)

Okay, that’s enough examples.

3.3 Functor categories

The header of this section is laden with implication … but first let’s take a look at some more features about natural transformations.

For any functor $F\colon C\to D$, there exists a natural transformation $1_F\colon F\Rarr F$ (or $id_F$), called the identity natural transformation whose components are all idenitity morphisms. The naturality square

$$ \begin{array}{ccc} FX & \stackrel{(1_F)_X}{\to} & FX \\ \scriptsize{Ff}\darr & {} & \darr\scriptsize{Ff} \\ FY & \stackrel{(1_F)_Y}{\to} & FY \\ \end{array} $$

commutes, since $(1_F)_X=1_X$ and $1_X\circ Ff=Ff=Ff\circ 1_X$.

If we have functors $F,G,H\colon C\to D$, we also have an obvious way of composing two natural transformations $\eta\colon F\Rarr G$ and $\epsilon\colon G\Rarr H$ to get $\epsilon\eta\colon F\Rarr H$ by defining $(\epsilon\eta)_X=\epsilon_X\eta_X$. We can verify the naturality condition by gluing together both commutative squares:

$$ \begin{array}{ccc} FX & \stackrel{\eta_X}{\to} & GX & \stackrel{\epsilon_X}{\to} & HX & {} & FX & \stackrel{(\epsilon\eta)_X}{\to} & HX \\ \scriptsize{Ff}\darr & {} & \darr\scriptsize{Gf} & {} & \darr\scriptsize{Hf} & = & \scriptsize{Ff}\darr & {} & \darr\scriptsize{Hf} \\ FY & \stackrel{\eta_Y}{\to} & GY & \stackrel{\epsilon_Y}{\to} & HY & {} & FY & \stackrel{(\epsilon\eta)_Y}{\to} & HY \\ \end{array} $$

and performing the following manipulation using the commutativity of the squares: $(\epsilon\eta)_Y(Ff)=\epsilon_Y\eta_Y(Ff)=\epsilon_Y(Gf)\eta_X=(Hf)\epsilon_X\eta_X=(Hf)(\epsilon\eta)_X$. (This kind of manipulation might make you feel better about using the word “commutative.")

Now we’re ready for the following: If $C$ and $D$ are categories, we can define the functor category $[C,D]$, where the objects are all functors $F,G\colon C\to D$ and the morphisms are natural transformations $F\Rarr G$. It is also written $D^C$, since it is also an exponential object in $\text{Cat}$, but we’ll cover that at another time.

With this in mind, we can turn $\text{Cat}$ into a strict 2-category, which is like a category but with morphisms between its morphisms, which we call 2-morphisms. In $\text{Cat}$, small categories are objects, 1-morphisms are functors, and 2-morphisms are natural transformations. Then, $\text{Hom}_\text{Cat}(C,D)=[C,D]$.

There’s one more thing missing from the definition of a 2-category…

3.4 Horizontal composition

There’s another kind of composition of 2-morphisms, called the horizontal composition. For example, we can compose natural transformations $\eta\colon F\Rarr G\colon C\to C'$ (this notation means $F,G\colon C\to C'$) and $\eta'\colon F'\Rarr G'\colon C'\to C''$ to get ${\eta'\ast\eta\colon F'F\Rarr G'G\colon C\to C''}$. (The ordinary kind of composition is called vertical composition because the natural transformations are usually vertically aligned when people draw them, and similarly for horizontal composition.)

Exactly how it works is a bit complicated, and it’s only relevant if you wish to learn the secrets of the universe, so feel free to skim or skip. (I highly recommend you attempt to visualize the setup in the previous paragraph, perhaps by following the Wikipedia link and looking at the diagrams.)

We’re going to define horizontal composition in terms of vertical composition by claiming that for all objects $A\colon C$, we have

$$ (\eta'\ast\eta)_A=(\eta'_{GA})(F'\eta_A)=(G'\eta_A)(\eta'_{FA}) $$

which we might instead write out of simplicity, using new notation,

$$ \eta'\ast\eta=(\eta'G)(F'\eta)=(G'\eta)(\eta'F). $$

First, let’s clarify what this notation means. We define $(\eta'F)_A$ to be $\eta'_{FA}$. Recall that $\eta'\colon F'\Rarr G'\colon C'\to C''$. By “composing” $\eta'$ with $F$, we can “extend” this natural transformation to ${\eta'F\colon F'F\Rarr G'F\colon C\to C''}$. It’s a rather simple way of “horizontally extending” a natural transformation, in this case to the left (in the diagram you should be visualizing). Intuitively, $F$ allows us to think about the structure in $C$ as an image in $C'$, and therefore allow $\eta'$ to act on it, producing the natural transformation $\eta'F$.

Similarly, we define $(G'\eta)_A$ to be $G'\eta_A$. Recall that $\eta\colon F\Rarr G\colon C\to C'$. By “composing” $G'$ with $\eta$, we can “extend” this natural transformation to $G'\eta\colon G'F\Rarr G'G\colon C\to C''$. Again, it’s a rather simple way of “horizontally extending” a natural transformation, but in this case to the right. Intuitively, $\eta$ describes some natural “object-free morphism” in $C'$, indexed by objects in $C$ and source/target pointed to by $F$ and $G$. Since $G'$ produces an image of $C'$ in $C''$, it must also mirror this natural “object-free morphism” in $C''$.

Before we’re irreparably dazed and stunned, we’re going to understand precisely what this means. Let’s (vertically) compose the two natural transformations in the previous two paragraphs, and draw a component of this composition for an object $A\colon C$, choosing this evocative shape for no particular reason:

$$ \begin{array}{ccc} F'FA & \stackrel{\eta'_{FA}}{\to} & G'FA \\ \scriptsize{?}\darr & {\searrow} & \darr\scriptsize{G'\eta_A} \\ {?} & \stackrel{?}{\to} & G'GA \\ \end{array} $$

where the diagonal arrow is $(G'\eta_A)(\eta'_{FA})$, the composition.

We can clearly see how these two “augmented” natural transformations compose. However, the parts labeled with question marks … the symmetry seems to suggest something. It’s a good guess that the bottom left object is $F'GA$. Let’s fill that in, including the missing arrows implied by the symmetry.

$$ \begin{array}{ccc} F'FA & \stackrel{\eta'_{FA}}{\to} & G'FA \\ \scriptsize{F'\eta_A}\darr & {\searrow} & \darr\scriptsize{G'\eta_A} \\ F'GA & \stackrel{\eta'_{GA}}{\to} & G'GA \\ \end{array} $$

The truly excellent part is the realization that this is precisely the naturality square for ${\eta'\colon F'\Rarr G'\colon C'\to C''}$ for the morphism $\eta_A\colon FA\to GA$, and therefore the diagram commutes! Importantly, this implies that $(\eta'_{GA})(F'\eta_A)=(G'\eta_A)(\eta'_{FA})$. Since we made no “artificial” choice when constructing this, we define the value of this equality to be $(\eta'\ast\eta)_A$, which unambiguously defines the horizontal composition ${\eta'\ast\eta\colon F'F\Rarr G'G\colon C\to C''}$!

3.5 The interchange law

This kind of horizontal composition of 2-morphisms by gluing them at objects (0-morphisms), as well as the convention vertical composition of 2-morphisms by gluing them at 1-morphisms, is required for the definition of a strict 2-category. However, they together must obey the interchange law, which we will prove for $\text{Cat}$.

Choose some strict 2-category, with some objects $C,C',C''$. Let $F,G,H\colon C\to C'$, $\eta\colon F\Rarr G$, $\epsilon\colon G\Rarr H$, and $F',G',H'\colon C'\to C''$, $\eta'\colon F'\Rarr G'$, $\epsilon'\colon G'\Rarr H'$. (Visualize this if you can.) Then the interchange law states that

$$ \epsilon'\eta'\ast\epsilon\eta=(\epsilon'\ast\epsilon)(\eta'\ast\eta) $$

where the horizontal composition, denoted by juxtaposition, has precedence over the vertical composition $\ast$.

In $\text{Cat}$, we can glue two of the horizontal composition squares from earlier to represent the right-hand side of the above equation. (Pretend the arrows look like $\Rarr$, since I can’t draw the diagonal one.)

$$ \begin{array}{ccc} F'F & \stackrel{F'\eta}{\to} & F'G & \stackrel{?}{\to} & ? \\ \scriptsize{\eta'F}\darr & \searrow & \darr\scriptsize{\eta'G} & {} & \darr\scriptsize{?} \\ G'F & \stackrel{G'\eta}{\to} & G'G & \stackrel{G'\epsilon}{\to} & G'H \\ \scriptsize{?}\darr & {} & \darr\scriptsize{\epsilon'G} & \searrow & \darr\scriptsize{\epsilon'H} \\ ? & \stackrel{?}{\to} & H'G & \stackrel{H'\epsilon}{\to} & H'H \\ \end{array} $$

The top left southeast arrow is $\eta'\ast\eta$, and the bottom right one is $\epsilon'\ast\epsilon$. This diagram is commutative, since we just glued together two commutative squares, so any path from $F'F$ to $H'H$ equals $(\epsilon'\ast\epsilon)(\eta'\ast\eta)$.

Notice a pattern? Let’s fill in the missing parts.

$$ \begin{array}{ccc} F'F & \stackrel{F'\eta}{\to} & F'G & \stackrel{F'\epsilon}{\to} & F'H \\ \scriptsize{\eta'F}\darr & \searrow & \darr\scriptsize{\eta'G} & \searrow & \darr\scriptsize{\eta'H} \\ G'F & \stackrel{G'\eta}{\to} & G'G & \stackrel{G'\epsilon}{\to} & G'H \\ \scriptsize{\epsilon'F}\darr & \searrow & \darr\scriptsize{\epsilon'G} & \searrow & \darr\scriptsize{\epsilon'H} \\ H'F & \stackrel{H'\eta}{\to} & H'G & \stackrel{H'\epsilon}{\to} & H'H \\ \end{array} $$

You can convince yourself bottom left and top right squares are just commutative naturality squares from the horizontal composition example above. Then, the entire diagram is commutative!

Let’s take a look at the composable arrows along the very edges of the diagram. What is the composition of the top edge, $(F'\epsilon)(F'\eta)$? Since vertical composition is given by the pointwise composition of the components, for any $A\colon C$,

$$ \begin{aligned} ((F'\epsilon)(F'\eta))_A &=(F'\epsilon)_A(F'\eta)_A\\ &=(F'\epsilon_A)(F'\eta_A)\\ &=F'(\epsilon_A\eta_A)\\ &=(F'\epsilon\eta)_A \end{aligned} $$

with the third equality given because of the functoriality of $F'$, i.e. functors preserve composition of morphisms. Similarly, for the left edge,

$$ \begin{aligned} ((\epsilon'F)(\eta'F))_A &=(\epsilon'F)_A(\eta'F)_A\\ &=\epsilon'_{FA}\eta'_{FA}\\ &=(\epsilon'\eta')_{FA}\\ &=(\epsilon'\eta'F)_A\\ \end{aligned} $$

You should now feel safe composing the edges of the diagram above and the long diagonal, ignoring the other stuff, to get

$$ \begin{array}{ccc} F'F & \stackrel{F'\epsilon\eta}{\to} & F'H \\ \scriptsize{\epsilon'\eta'F}\darr & {\searrow} & \darr\scriptsize{\epsilon'\eta'H} \\ H'F & \stackrel{H'\epsilon\eta}{\to} & H'H \\ \end{array} $$

The diagonal arrow is, as we learned earlier, the right-hand side $(\epsilon'\ast\epsilon)(\eta'\ast\eta)$ of the interchange law. However, this is another horizontal composition square, specifically for $\epsilon'\eta'\ast\epsilon\eta$. Since the diagram is commutative, this proves the interchange law! 10

Again, while we can easily prove this in $\text{Cat}$, the interchange law is required in the definition of any strict 2-category.

The interchange law simply means that if we have four 2-morphisms arranged in such a configuration as above, we can glue along the 1-morphisms then the 0-morphism or vice versa to get the same result.

3.6 More intuition

Hopefully you now have an intuitive understanding of natural transformations. Let’s try to think of them in different ways in an attempt to broaden and solidify this understanding.

First, let’s add another way to think about a functor to our tooklit. We can consider a functor $F\colon C\to D$ is as a collection of objects in $D$ indexed by objects in $C$, of course preserving the structure in $C$. To put it another way, we can think of $C$ as a schematic or a pattern, and $D$ as where we realize and reify the schematic. Then $F$ tells us which parts of $D$ correspond to the blueprint of $C$. In a sense, $F$ describes a “picture” of $C$ inside $D$. In various areas of math, we might call something like this an “interpretation” or a “model.”

When we say $F\colon C\to D$ indexes objects in $D$ by objects in $C$, it should remind you of sequences. For example, $a_0,a_1,a_2,\ldots,a_i,\ldots$ is a collection of objects $a_i$ (in some set $S$) which we can pick out with an index $i\colon\N$. Therefore, we can think of a sequence of objects in $S$ as a function $a\colon\N\to S$. Much how we think like a sequence “living” where the enumerated objects do, we can think about the functor $F$ as “living” in $D$. We have to remember that functors also capture any structure between the “indices.”

From this perspective, what are natural transformations? If $A\colon C$, and $F,G\colon C\to D$, a natural transformation $\eta\colon F\Rarr G$ is a kind of correspondence between any two objects $FA,GA\colon D$ indexed by the same object $A$, which captures the indexed relationships realized by $F$ inside of $G$. In other words, it shows one realization $F$ of the schematic $C$ inside $D$ inside another realization $G$, keeping corresponding parts in mind as a result of the index $C$.


I think I need a break after all of those concepts, and you probably do too.

The following sections are unfinished parts of this post/series of posts, but hopefully you’ll be able to see where I’m going with this.

4 Some universal constructions

4.1 Initial and terminal objects

  • cover cones, mention universal property
  • mention “limit” and “colimit”
  • (co)limits, if they exist, are unique up to a unique isomorphism (“essentially unique”)
  • concrete examples from Set
  • list examples in Grp, Ab, Vect

4.2 Products and coproducts

  • same as above
  • cover biproduct, example in Ab, Vect

4.3 Equalizers and coequalizers

4.4 Pullbacks and pushouts

4.5 Limits and colimits

5 Monoids, monoidal categories and monoid objects

First of all, we need to understand the following concept. A monoid is:

  • a set $M$
  • a binary operation $\cdot\colon M\times M \to M$ (sometimes called multiplication) that sends two elements of the $M$ to a single element in $M$
  • a particular element $e$ of $M$ called the identity.

As mentioned earlier, monoids can be viewed as categories with a single object. The elements correspond to morphisms, and the identity element corresponds to the identity morphism. Therefore, monoids are rather natural construction in category theory.

Again, groups are monoids where every morphism is an isomorphism, that is, the morphisms have inverses.

6 Cartesian closed categories and the internal hom

7 Monads

8 Comonads

9 The Writer

9.1 Writer monad

9.2 Writer comonad

10 The Reader

10.1 Reader monad

10.2 Reader comonad


  1. The burrito is a common but painfully inept analogy for monads. To use an analogy for the burrito analogy, the burrito analogy is like the rubber-sheet analogy for general relativity: somewhat intuitive, but it falls apart the moment you think about it. (What’s pulling stuff down?) ↩︎

  2. Then tear out the still-beating heart (problem), grasped in your bloodied hand (keyboard), whereupon you place it on the blood altar (category theory) in a gesture to the gods of mathematics and computer science (Euler, Gauss, Grothendieck, Turing, Lovelace, and Knuth). ↩︎

  3. For some categories, this may not be a set, either because you replace the objects in the hom-set with objects from another category, enriching it, or the category is large and you instead have a hom-class, where a class is simply a collection of things without all the baggage of set theory axioms. ↩︎

  4. To be precise, a concretizable category is one that can be made concrete with the choice of a faithful functor $U\colon C\to\text{Set}$, called the forgetful functor, that “forgets” extra structure and reveals the underlying set. A concretizable category may be made concrete in many ways (with the choice of various forgetful functors), but there’s usually a nice way to do it. What’s a functor? A map between categories — I’m getting to it! ↩︎

  5. To be even more precise, it can be either all the morphisms into an object or out of it. This is formalized by the Yoneda lemma, which Wikipedia calls “arguably the most important result in category theory.” I’m still trying to fully comprehend it. ↩︎

  6. The reals can be considered a one-dimensional vector space over itself. ↩︎

  7. The notation $x\mapsto y$ denotes what is essentially a lambda function. It accepts an argument, binds it to the symbol $x$, and returns the expression $y$ where $x$ is replaced with the value bound to $x$. By convention, it associates to the right: $a\mapsto (b\mapsto c)=a\mapsto b\mapsto c$. ↩︎

  8. Seriously, just try labeling each symbol with a type and see what happens! ↩︎

  9. Their inverses are also mapped, and since the multiplicative unit $1$ is preserved by $f$, they stay inverses, guaranteeing that units are mapped to units. However, non-units may be mapped to units. ↩︎

  10. I actually came up with the proof in this section by myself while trying to prove the interchange law. I’m pretty sure this is the default, obvious proof, but I’m still proud of it. ↩︎