Categories: the basics, towards (co)monads (Part 1)

As the millionth person to write about monads, I suppose I must claim that all other monad tutorials, including their burrito analogies,1 are woefully lacking in substance! By the end of the post, you will hopefully have a better understanding of a monad from a more mathematical viewpoint, that is, recognize natural appearances of monadic patterns and use cases, and use your newfound pure mathematical skill to cut straight to the heart of the problem at hand.2

A monad is a specific kind of computation. They abstract the same basic structures across various types. The dual concept, comonads, are less commonly seen, but they are still useful.

We do not wish to simply glance past the concept of monads and be left head-scratching days later. To understand monads, we need to “briefly” go over

  1. categories,
  2. functors,
  3. natural transformations,
  4. some universal constructions,
  5. monoids, monoidal categories and monoid objects,
  6. and cartesian closed categories and the internal hom.

Then, we’ll finally be able to cover

  1. monads,
  2. comonads,
  3. and the reader/writer examples.

This post (part 1) only covers the first three items in the first list, but it does so comprehensively. These three concepts are crucial to understand, and form the basis of category theory. Saunders Mac Lane, one of the cofounders of category theory, wrote the following in his textbook Categories for the Working Mathematician: “I didn’t invent categories to study functors; I invented them to study natural transformations.”

Feel free to skim or skip sections covering concepts you are familiar with. We’re not going to rigorously prove everything, but we won’t shy away from detail when it’s needed. Let’s begin!

1 Categories

Categories are made of two kinds of things: objects and morphisms (also called arrows or maps). In Set\text{Set}, the category of sets, the objects are sets and the morphisms are functions between them, so use this as a kind of analogy in your head.

Every morphism ff of a category CC has a source object AA and a target BB where AA and BB are objects of CC, which we write f ⁣:ABf\colon A\to B and say “ff is a morphism from AA to BB. The set3, or hom-set, of morphisms from AA to BB is denoted Hom(A,B)\text{Hom}(A, B), HomC(A,B)\text{Hom}_C(A, B), or just C(A,B)C(A, B). For every pair of morphisms f ⁣:XYf\colon X\to Y and g ⁣:YZg\colon Y\to Z, we may compose them to form gf ⁣:XZg\circ f\colon X\to Z, also written gfgf. Additionally, we require the following axioms to hold:

  • Composition must be associative. That is, (hg)f=h(gf)(h\circ g)\circ f=h\circ (g\circ f) for all f ⁣:WXf\colon W\to X, g ⁣:XYg\colon X\to Y, and f ⁣:YZf\colon Y\to Z.
  • For every object AA, there is a morphism 1X1_X (or idXid_X), which is an identity for composition. That is, for all f ⁣:AXf\colon A\to X and g ⁣:XAg\colon X\to A we have 1Xf=f1_X\circ f=f and g1X=gg\circ 1_X=g. (It is simple to prove that this morphism is unique.)

This should remind you of the domain, codomain, and composition of functions. To keep track of equations involving composition, we may use commutative diagrams, where any two paths along a chain of arrows from one object to another yields an equality. The following diagram says that h=gfh=g\circ f:

AfBhgC \begin{array}{ccc} A & \stackrel{f}{\to} & B \\ {} & \mathllap{\scriptsize{h}}{\searrow} & \darr\scriptsize{g} \\ {} & {} & C \end{array}

Wow, that was a mouthful. Let’s walk through a few examples:

  • In Set\text{Set}, the category of sets (as mentioned earlier), the objects are sets and the morphisms are functions between sets. We know function composition is associative and the existence of an idenitity function for each set that maps every element to itself.
  • In Grp\text{Grp}, the category of groups, the objects are groups and the morphisms are group homomorphisms. Since we can define groups and group homomorphisms in terms of sets and functions, we can reuse the above argument to deduce that they satisfy the category axioms.
  • In VectR\text{Vect}_\R, the category of real vector spaces, the objects are vector spaces over the reals and the morphisms are linear transformations (matrix-vector multiplication). Vector spaces can be seen as sets and linear transformations can be seen as functions, blah blah blah, they satisfy the axioms.

You might notice a pattern here. That’s because all these categories are concretizable categories, which basically mean that the objects and morphisms can be seen as sets and functions with extra structure on top. Most interesting categories are concretizable,4 although there are exceptions, but we won’t deal with those here.

The general idea behind many categories is that it allows us to study certain kinds of objects by understanding how they relate to one another with structure-preserving maps. In Grp\text{Grp} and VectR\text{Vect}_\reals, the axioms (and therefore properties) of groups and vector spaces are preserved by morphisms. For example, linear transformations preserve vector scaling and addition: F(au+bv)=aF(u)+bF(v)F(au+bv)=aF(u)+bF(v), where aa and bb are scalars, uu and vv are vectors, and FF is a linear transformation. The “structure” of sets is quite general, and from the categorical viewpoint of functions between sets, two sets behave the same if they are the same cardinality (size). (This viewpoint is called structuralism.) Similarly, two finite dimensional vector spaces with the same dimension share similar behavior. We call two objects which essentially behave the same isomorphic. Two objects are isomorphic if there exists an isomorphism between them. We say f ⁣:ABf\colon A\to B is an isomorphism if there exists g ⁣:BAg\colon B\to A such that gf=1Agf=1_A and fg=1Bfg=1_B. You can think of an isomorphism as an invertible map.

Additionally, each category CC has an opposite CopC^{\text{op}}, which you get by turning all the arrows around. For example, the morphism f ⁣:XYf\colon X\to Y is actually fop ⁣:YXf^\text{op}\colon Y\to X in the opposite category — the objects are the same, but the arrows are backwards, which is achieved by formally swapping the source and target objects (essentially switching their names). The composite gf ⁣:XZg\circ f\colon X\to Z where g ⁣:YZg\colon Y\to Z is actually fopopgop ⁣:ZXf^\text{op}\circ^\text{op} g^\text{op}\colon Z\to X in the opposite category — the order of composition is simply reversed to account for the reversal of arrows. There’s often no good “meaningful” way to understand something like Setop\text{Set}^\text{op}, but opposite categories turn out to be quite useful anyway.

A natural question to ask is: what map preserves the structure of categories?

2 Functors

Functors are maps between categories that preserve the structure of a category. But what does that actually mean? Generally when we define mathematical objects, we outline a structure and some axioms it must obey. So, a functor F ⁣:CDF\colon C\to D between the categories CC and DD must map

  • objects XX to objects F(X)F(X), also written FXFX,
  • morphisms f ⁣:XYf\colon X\to Y to morphisms Ff ⁣:FXFYFf\colon FX\to FY (preserving the source and target object),
  • identity morphisms on each object to identity morphisms on the corresponding object F(1X)=1F(X)F(1_X)=1_{F(X)},
  • and composition of morphisms: F(gf)=F(g)F(f)F(g\circ f)=F(g)\circ F(f). Since equations may be represented by commutative diagrams, we say that functors preserve commutative diagrams.

There are also contravariant functors, which switch the source and target of morphisms as well as the order of composition. In contrast, the “ordinary” functors may be called covariant. Luckily, every contravariant functor F ⁣:CDF\colon C\to D may instead be viewed as a covariant functor F ⁣:CopDF\colon C^\text{op}\to D or F ⁣:CDopF\colon C\to D^\text{op}, using the opposite category construction from above.

Examples!

  • An endofunctor CCC\to C is a functor from a category CC to itself.
  • An identity functor is an endofunctor that sends objects and morphisms to themselves.
  • Two functors F ⁣:CD,G ⁣:DEF\colon C\to D, G\colon D\to E can be composed to form GF ⁣:CEG\circ F\colon C\to E, also written GFGF. (It’s trivial to check that the composite satisfies the functor axioms, i.e. the mapping is functorial. If you’re not sure, do it yourself!)
  • A constant functor CDC\to D which maps every object of CC to a single object in DD and every morphism to the identity morphism on that object.
  • A forgetful functor U ⁣:CSetU\colon C\to \text{Set} maps every object of a concrete category to its underlying set, sending morphisms to the underlying functions.
  • A free functor F ⁣:SetCF\colon \text{Set}\to C, when it exists, maps each set to the “free object” in CC on that set.
    • For instance, when CC is VectR\text{Vect}_\reals, FF maps an nn-element set to a vector space with nn dimensions. Functions to/from the original set can be seen as projections/embeddings to/from the vector space that preserve composition of the functions. (Yeah, it’s a bit weird.)
  • The (covariant) power set functor P ⁣:SetSetP\colon \text{Set}\to\text{Set} sends each set to its power set (that is, the set of all subsets) and each function f ⁣:XYf\colon X\to Y to the direct image function f ⁣:P(X)P(Y)f_*\colon P(X)\to P(Y), which maps every subset UXU\subseteq X to its image f(U)Yf_*(U)\subseteq Y.
  • The contravariant power set functor P ⁣:SetSetopP\colon \text{Set}\to\text{Set}^\text{op} (different PP) still sends each set to its power set and each function f ⁣:XYf\colon X\to Y to the inverse image function f ⁣:P(Y)P(X)f^*\colon P(Y)\to P(X), which maps every subset VYV\subseteq Y to its preimage f(V)Xf^*(V)\subseteq X.
  • If we think of groups as one object categories where all morphisms are isomorphisms (ensuring their invertibility), then a functor from one group to another is a group homomorphism. (If you didn’t know what a group is, now you know!) Additionally, a functor from a group to Set\text{Set} describes a group action on a particular set, and similarly a functor from the group to any category describes a group action on a particular object.

Notice that you can compose functors, and that the composition has identities, namely, the identity functors. As you might guess, there is a category of small categories, called Cat\text{Cat}, whose objects are small categories and morphisms are functors. (The word “small” deals with issues pertaining to size, such as Russell’s paradox and a set of all sets.)

We’re on a roll here! Maps between objects constitute a category, and functors between categories constitute the category of small categories.

… are there maps between functors?

3 Natural transformations

Like how functors are maps between categories, natural transformations are maps between functors. Let F,G ⁣:CDF,G\colon C\to D be functors. A natural transformation η ⁣:FG\eta\colon F\Rarr G is a family of morphisms such that:

  • for every object X ⁣:CX\colon C, we have a morphism ηX ⁣:F(X)G(X)\eta_X\colon F(X)\to G(X) called the component of η\eta at XX,
  • and for all f ⁣:XYf\colon X\to Y in CC, ηYFf=GfηX\eta_Y\circ Ff=Gf\circ \eta_X, that is, the following diagram commutes:

FXηXGXFfGfFYηYGY \begin{array}{ccc} FX & \stackrel{\eta_X}{\to} & GX \\ \scriptsize{Ff}\darr & {} & \darr\scriptsize{Gf} \\ FY & \stackrel{\eta_Y}{\to} & GY \\ \end{array}

This is getting dangerously abstract — what are we actually saying here?

First of all let’s remind ourselves that the definition of a category does not actually require objects to have any internal structure. As far as category theory is concerned, they don’t. Therefore, the “properties” of an object are simply determined by the morphisms into and out of it.5

Notice how this square commutes for all morphisms in the category CC, no matter what objects are the source and target. In a sense, η ⁣:FG\eta\colon F\Rarr G is a kind of map that is independent of the particular object each component is assigned to — instead of being from one object to another, it’s one functor to another. (Remember that the components are in DD.) Don’t worry if you’re confused, because we have examples!

3.1 Embedding vector spaces into their double dual

Consider the category VectR\text{Vect}_\reals, and recall that its morphisms are linear transformations, also known as linear maps. Let VV be a real vector space and consider the set of maps from it to the reals,6 Hom(V,R)\text{Hom}(V,\reals) or VRV\to\reals. By defining the operations of addition and multiplication on it, the hom-set becomes a vector space — the dual vector space, which we denote VV^*.

Let a ⁣:Ra\colon\reals, v ⁣:Vv\colon V, and f,g ⁣:Vf,g\colon V^*. Then, in VV^*,

  • addition is defined as f+g=(vf(v)+g(v))f+g=(v\mapsto f(v)+g(v)),7
  • and scalar multiplication is defined as af=(vaf(v))af=(v\mapsto af(v)).

This should hopefully convince you VV^* is a vector space. If you’re not sure, verify it!

The elements of a dual vector space are called covectors or one-forms. If you choose an inner product (like the standard dot product), you can embed a vector space into its dual space with the map uvuvu\mapsto v\mapsto u\cdot v. In fact, if the dimension of VV is finite, then VV and VV^* are isomorphic. However, we’re not always so lucky: vector spaces generally do not come with a particular inner product. Every choice of an inner product defines a different map to our dual vector space; there’s no standard/canonical one.

The dual vector space is called “dual” for a reason. It turns out there is a natural embedding of VV into its double dual, VV^{**}, and that this embedding is an isomorphism when VV has finite dimension! This map is defined as

ΨV ⁣:V((VR)R)=vφφ(v), \Psi_V\colon V\to ((V\to\reals)\to\reals)=v\mapsto \varphi\mapsto \varphi(v),

which takes every vector vv to the evaluation map at vv. The evaluation map simply evaluates a function ff at vv, returning φ(v)\varphi(v). For a map to be natural, it must somehow be “choice-free” — this usually suggests the map is a component of a natural transformation. For example, to define ΨV ⁣:VV\Psi_V\colon V\to V^{**}, we didn’t have to make any unnatural choices, unlike when we were defining an embedding VVV\to V^*.

Now, we just have to extend ΨV ⁣:VV\Psi_V\colon V\to V^{**} to a natural transformation Ψ ⁣:1VectR\Psi\colon1_{\text{Vect}_\reals}\Rarr -^{**} from the identity functor to the double dual functor, both of which are endofunctors on VectR\text{Vect}_\reals. But wait — if we’re upgrading the double dual to a functor, what’s the double dual of a linear map? Don’t worry, it’s rather similar: if we have f ⁣:VWf\colon V\to W, then f ⁣:VW=θφθ(φf)f^{**}\colon V^{**}\to W^{**}=\theta\mapsto\varphi\mapsto \theta(\varphi\circ f). If you don’t trust me, break out a sheet of paper and try it yourself — similarly, verify the functoriality if you want.8

Now we’re ready. We wish to verify the naturality condition for Ψ\Psi:

XΨXXffYΨYY \begin{array}{ccc} X & \stackrel{\Psi_X}{\to} & {X^{**}} \\ \scriptsize{f}\darr & {} & \darr\scriptsize{f^{**}} \\ Y & \stackrel{\Psi_Y}{\to} & {Y^{**}} \\ \end{array}

This square tells us that ΨYf=fΨX\Psi_Y\circ f=f^{**}\circ\Psi_X. Let x ⁣:Xx\colon X, y ⁣:Yy\colon Y, φ ⁣:X\varphi\colon X^*, ϕ ⁣:Y\phi\colon Y^*, and θ ⁣:X\theta\colon X^{**}. Substituting and expanding this out yields

(yϕϕ(y))f=(θϕθ(ϕf))(xφφ(x))=xϕϕ(f(x))=xϕ(φφ(x))(ϕf)=xϕϕ(f(x))=xϕ(ϕf)(x) \begin{aligned} &&(y\mapsto\phi\mapsto\phi(y))\circ f&=(\theta\mapsto\phi\mapsto\theta(\phi\circ f))\circ (x\mapsto\varphi\mapsto\varphi(x))\\ =&&x\mapsto\phi\mapsto\phi(f(x))&=x\mapsto\phi\mapsto(\varphi\mapsto\varphi(x))(\phi\circ f)\\ =&&x\mapsto\phi\mapsto\phi(f(x))&=x\mapsto\phi\mapsto(\phi\circ f)(x)\\ \end{aligned}

(If you’re not sure what I just did, this is basically lambda calculus with a slightly different syntax. I’ve written a blog post covering the basics!)

You should be able to see that the equation holds, and therefore the transformation is natural! We should remind ourselves that the above commutative diagram holds for any morphism between any objects. This compatibility demonstrates that the natural transformation Ψ ⁣:1VectR\Psi\colon1_{\text{Vect}_\reals}\Rarr -^{**} represents a map that is independent of the specific objects it acts on.

In fact, if we restrict VectR\text{Vect}_\reals to FinVectR\text{FinVect}_\reals, the category of finite-dimensional real vector spaces, Ψ\Psi becomes a natural isomorphism, which is a natural transformation where every component is an isomorphism. This is a fancy restatement of what we noted earlier: finite-dimensional vector spaces are naturally isomorphic to their double duals.

3.2 The determinant

The determinant of a square matrix, if you aren’t aware, is a number that represents the “magnitude” of the matrix. For example, the determinant of a 3-by-3 matrix is equal to the volume of the unit cube under the image of the matrix, that is, after it is transformed. (You should think of matrices as linear transformations here. A matrix MM sends a vector vv to MvMv under its corresponding linear transformation.)

Before we get to the point, let’s go over a few remarks and definitions.

  • If the determinant has no reciprocal, the matrix is said to be singular. Singular matrices do not preserve uniqueness; two vectors may map to the same thing. Therefore, they are not invertible.
  • Nonsingular matrices are invertible, and their determinants are the reciprocal of the determinant of their inverse.
  • In fact, for two matrices A,BA,B,AB|A||B|=AB|A||B|, where |-| is the determinant of a matrix.
    • The above statement follows from this one since AA1=A1A=IAA^{-1}=A^{-1}A=I and I=1|I|=1, where AA is an invertible matrix and II is the identity matrix.
    • This is actually kind of hard to prove (because I don’t know how). Hopefully the “volume of a unit cube” thing should be enough intuition for you to accept this.
  • A field is a set where we can do addition, multiplication, subtraction, and division (but not by zero), like we can with the real (R\reals) and complex (C\Complex) numbers. A commutative ring is a generalization of a field where division is not guaranteed, such as the integers (Z\Z), Gaussian integers (Z[i]={a+bia,bZ}Z[i]=\{a+bi\mid a,b\in\Z\}), and polynomials with real coefficients (R[x]\reals[x]).
  • When a matrix has entries in a field KK, or more generally, a commutative ring KK, we say the matrix is over KK, and that KK is the ground field/commutative ring.
  • The elements in a commutative ring which do have multiplicative inverses (reciprocals) are called units. The set of units of a commutative ring RR under multiplication form a group R×R^\times called the group of units. For example, Z×={1,1}{\Z^\times=\{1,-1\}}, Z[i]×={1,i,1,i}{\Z[i]^\times=\{1,i,-1,-i\}}, and R×=R[x]×=R\{0}\reals^\times=\reals[x]^\times=\reals\backslash\{0\}, where the group operation is inherited from the parent commutative ring’s multiplication.
  • The set of all invertible n-by-n matrices becomes a group under matrix multiplication (which corresponds to composing linear transformations). We call this group the general linear group GLn(K)\text{GL}_n(K), where KK is the ground field/commutative ring.

The third point above notes that we can multiply matrices and take the determinant, or take the determinant and multiply the resulting numbers. This is precisely a group homomorphism detR ⁣:GLn(R)R×\text{det}_\reals\colon\text{GL}_n(\reals)\Rarr\reals^\times when our matrices are over R\reals. GLn(R)\text{GL}_n(\reals) captures matrix multiplication while R×\reals^\times captures multiplication in the ground commutative ring.

If you’ve been following along, your natural transformation sense should be tingling, because doesn’t this relationship hold for things that aren’t necessarily the reals? You’re correct — the determinant, as a “general morphism” in the category of groups Grp\text{Grp}, is independent of whatever particular commutative ring the matrices are over. In fact, we can define the general linear group GLn()\text{GL}_n(-) and the group of units ()×(-)^\times as functors from the category of commutative rings CRing\text{CRing} to Grp\text{Grp}, and use this to define the determinant det ⁣:GLn()()×\text{det}\colon\text{GL}_n(-)\Rarr (-)^\times as a natural transformation. Let’s take a look at the naturality square, which holds for any morphism ff:

GLn(R)detRR×GLn(f)f×GLn(S)detSS× \begin{array}{ccc} \text{GL}_n(R) & \stackrel{\text{det}_R}{\to} & R^\times \\ \scriptsize{\text{GL}_n(f)}\darr & {} & \darr\scriptsize{f^\times} \\ \text{GL}_n(S) & \stackrel{\text{det}_S}{\to} & S^\times \\ \end{array}

We have three concerns to address: what are GLn()\text{GL}_n(-) and ()×(-)^\times applied to a morphism ff in CRing\text{CRing}, and what does it intuitively mean for this square to commute? Well, the first one is rather simple: using ff, map each entry in the matrix. The second one is simple, too: since the group of units is a subset of a ring, ff can map those elements.9

Like the double dual example, the square commuting simply means that det\text{det} is independent of the particular commutative ring the matrices are over. All the possible morphisms f ⁣:RSf\colon R\to S account for all the properties of all possible objects RR and SS, so the commutative square describes a kind of compatibility between det\text{det} and the structure of the category Grp\text{Grp}. You can try verifying the naturality condition for the following maps:

  • aa ⁣:RCa\mapsto a\colon\reals\to\Complex, the embedding of the reals into the complexes,
  • pp(a) ⁣:R[x]Rp\mapsto p(a)\colon\reals[x]\to\reals, the polynomial evaluation map for a fixed a ⁣:Ra\colon\reals,
  • and zz ⁣:CCz\mapsto z^*\colon\Complex\to\Complex, the complex conjugation map.

Of course, the standard determinant also works for (square) singular matrices, but if we restrict it to invertible matrices, we can understand the determinant as a natural transformation in Grp\text{Grp}. I would guess that using the category of monoids Mon\text{Mon} instead of Grp\text{Grp} would capture this missing behavior, but I haven’t checked carefully. (A monoid is a generalization of a group, where elements do not have to be invertible. It can be viewed as a one object category!)

Okay, that’s enough examples.

3.3 Functor categories

The header of this section is laden with implication … but first let’s take a look at some more features about natural transformations.

For any functor F ⁣:CDF\colon C\to D, there exists a natural transformation 1F ⁣:FF1_F\colon F\Rarr F (or idFid_F), called the identity natural transformation whose components are all idenitity morphisms. The naturality square

FX(1F)XFXFfFfFY(1F)YFY \begin{array}{ccc} FX & \stackrel{(1_F)_X}{\to} & FX \\ \scriptsize{Ff}\darr & {} & \darr\scriptsize{Ff} \\ FY & \stackrel{(1_F)_Y}{\to} & FY \\ \end{array}

commutes, since (1F)X=1X(1_F)_X=1_X and 1XFf=Ff=Ff1X1_X\circ Ff=Ff=Ff\circ 1_X.

If we have functors F,G,H ⁣:CDF,G,H\colon C\to D, we also have an obvious way of composing two natural transformations η ⁣:FG\eta\colon F\Rarr G and ϵ ⁣:GH\epsilon\colon G\Rarr H to get ϵη ⁣:FH\epsilon\eta\colon F\Rarr H by defining (ϵη)X=ϵXηX(\epsilon\eta)_X=\epsilon_X\eta_X. We can verify the naturality condition by gluing together both commutative squares:

FXηXGXϵXHXFX(ϵη)XHXFfGfHf=FfHfFYηYGYϵYHYFY(ϵη)YHY \begin{array}{ccc} FX & \stackrel{\eta_X}{\to} & GX & \stackrel{\epsilon_X}{\to} & HX & {} & FX & \stackrel{(\epsilon\eta)_X}{\to} & HX \\ \scriptsize{Ff}\darr & {} & \darr\scriptsize{Gf} & {} & \darr\scriptsize{Hf} & = & \scriptsize{Ff}\darr & {} & \darr\scriptsize{Hf} \\ FY & \stackrel{\eta_Y}{\to} & GY & \stackrel{\epsilon_Y}{\to} & HY & {} & FY & \stackrel{(\epsilon\eta)_Y}{\to} & HY \\ \end{array}

and performing the following manipulation using the commutativity of the squares: (ϵη)Y(Ff)=ϵYηY(Ff)=ϵY(Gf)ηX=(Hf)ϵXηX=(Hf)(ϵη)X(\epsilon\eta)_Y(Ff)=\epsilon_Y\eta_Y(Ff)=\epsilon_Y(Gf)\eta_X=(Hf)\epsilon_X\eta_X=(Hf)(\epsilon\eta)_X. (This kind of manipulation might make you feel better about using the word “commutative.")

Now we’re ready for the following: If CC and DD are categories, we can define the functor category [C,D][C,D], where the objects are all functors F,G ⁣:CDF,G\colon C\to D and the morphisms are natural transformations FGF\Rarr G. It is also written DCD^C, since it is also an exponential object in Cat\text{Cat}, but we’ll cover that at another time.

With this in mind, we can turn Cat\text{Cat} into a strict 2-category, which is like a category but with morphisms between its morphisms, which we call 2-morphisms. In Cat\text{Cat}, small categories are objects, 1-morphisms are functors, and 2-morphisms are natural transformations. Then, HomCat(C,D)=[C,D]\text{Hom}_\text{Cat}(C,D)=[C,D].

There’s one more thing missing from the definition of a 2-category…

3.4 Horizontal composition

There’s another kind of composition of 2-morphisms, called the horizontal composition. For example, we can compose natural transformations η ⁣:FG ⁣:CC\eta\colon F\Rarr G\colon C\to C' (this notation means F,G ⁣:CCF,G\colon C\to C') and η ⁣:FG ⁣:CC\eta'\colon F'\Rarr G'\colon C'\to C'' to get ηη ⁣:FFGG ⁣:CC{\eta'\ast\eta\colon F'F\Rarr G'G\colon C\to C''}. (The ordinary kind of composition is called vertical composition because the natural transformations are usually vertically aligned when people draw them, and similarly for horizontal composition.)

Exactly how it works is a bit complicated, and it’s only relevant if you wish to learn the secrets of the universe, so feel free to skim or skip. (I highly recommend you attempt to visualize the setup in the previous paragraph, perhaps by following the Wikipedia link and looking at the diagrams.)

We’re going to define horizontal composition in terms of vertical composition by claiming that for all objects A ⁣:CA\colon C, we have

(ηη)A=(ηGA)(FηA)=(GηA)(ηFA) (\eta'\ast\eta)_A=(\eta'_{GA})(F'\eta_A)=(G'\eta_A)(\eta'_{FA})

which we might instead write out of simplicity, using new notation,

ηη=(ηG)(Fη)=(Gη)(ηF). \eta'\ast\eta=(\eta'G)(F'\eta)=(G'\eta)(\eta'F).

First, let’s clarify what this notation means. We define (ηF)A(\eta'F)_A to be ηFA\eta'_{FA}. Recall that η ⁣:FG ⁣:CC\eta'\colon F'\Rarr G'\colon C'\to C''. By “composing” η\eta' with FF, we can “extend” this natural transformation to ηF ⁣:FFGF ⁣:CC{\eta'F\colon F'F\Rarr G'F\colon C\to C''}. It’s a rather simple way of “horizontally extending” a natural transformation, in this case to the left (in the diagram you should be visualizing). Intuitively, FF allows us to think about the structure in CC as an image in CC', and therefore allow η\eta' to act on it, producing the natural transformation ηF\eta'F.

Similarly, we define (Gη)A(G'\eta)_A to be GηAG'\eta_A. Recall that η ⁣:FG ⁣:CC\eta\colon F\Rarr G\colon C\to C'. By “composing” GG' with η\eta, we can “extend” this natural transformation to Gη ⁣:GFGG ⁣:CCG'\eta\colon G'F\Rarr G'G\colon C\to C''. Again, it’s a rather simple way of “horizontally extending” a natural transformation, but in this case to the right. Intuitively, η\eta describes some natural “object-free morphism” in CC', indexed by objects in CC and source/target pointed to by FF and GG. Since GG' produces an image of CC' in CC'', it must also mirror this natural “object-free morphism” in CC''.

Before we’re irreparably dazed and stunned, we’re going to understand precisely what this means. Let’s (vertically) compose the two natural transformations in the previous two paragraphs, and draw a component of this composition for an object A ⁣:CA\colon C, choosing this evocative shape for no particular reason:

FFAηFAGFA?GηA??GGA \begin{array}{ccc} F'FA & \stackrel{\eta'_{FA}}{\to} & G'FA \\ \scriptsize{?}\darr & {\searrow} & \darr\scriptsize{G'\eta_A} \\ {?} & \stackrel{?}{\to} & G'GA \\ \end{array}

where the diagonal arrow is (GηA)(ηFA)(G'\eta_A)(\eta'_{FA}), the composition.

We can clearly see how these two “augmented” natural transformations compose. However, the parts labeled with question marks … the symmetry seems to suggest something. It’s a good guess that the bottom left object is FGAF'GA. Let’s fill that in, including the missing arrows implied by the symmetry.

FFAηFAGFAFηAGηAFGAηGAGGA \begin{array}{ccc} F'FA & \stackrel{\eta'_{FA}}{\to} & G'FA \\ \scriptsize{F'\eta_A}\darr & {\searrow} & \darr\scriptsize{G'\eta_A} \\ F'GA & \stackrel{\eta'_{GA}}{\to} & G'GA \\ \end{array}

The truly excellent part is the realization that this is precisely the naturality square for η ⁣:FG ⁣:CC{\eta'\colon F'\Rarr G'\colon C'\to C''} for the morphism ηA ⁣:FAGA\eta_A\colon FA\to GA, and therefore the diagram commutes! Importantly, this implies that (ηGA)(FηA)=(GηA)(ηFA)(\eta'_{GA})(F'\eta_A)=(G'\eta_A)(\eta'_{FA}). Since we made no “artificial” choice when constructing this, we define the value of this equality to be (ηη)A(\eta'\ast\eta)_A, which unambiguously defines the horizontal composition ηη ⁣:FFGG ⁣:CC{\eta'\ast\eta\colon F'F\Rarr G'G\colon C\to C''}!

3.5 The interchange law

This kind of horizontal composition of 2-morphisms by gluing them at objects (0-morphisms), as well as the convention vertical composition of 2-morphisms by gluing them at 1-morphisms, is required for the definition of a strict 2-category. However, they together must obey the interchange law, which we will prove for Cat\text{Cat}.

Choose some strict 2-category, with some objects C,C,CC,C',C''. Let F,G,H ⁣:CCF,G,H\colon C\to C', η ⁣:FG\eta\colon F\Rarr G, ϵ ⁣:GH\epsilon\colon G\Rarr H, and F,G,H ⁣:CCF',G',H'\colon C'\to C'', η ⁣:FG\eta'\colon F'\Rarr G', ϵ ⁣:GH\epsilon'\colon G'\Rarr H'. (Visualize this if you can.) Then the interchange law states that

ϵηϵη=(ϵϵ)(ηη) \epsilon'\eta'\ast\epsilon\eta=(\epsilon'\ast\epsilon)(\eta'\ast\eta)

where the horizontal composition, denoted by juxtaposition, has precedence over the vertical composition \ast.

In Cat\text{Cat}, we can glue two of the horizontal composition squares from earlier to represent the right-hand side of the above equation. (Pretend the arrows look like \Rarr, since I can’t draw the diagonal one.)

FFFηFG??ηFηG?GFGηGGGϵGH?ϵGϵH??HGHϵHH \begin{array}{ccc} F'F & \stackrel{F'\eta}{\to} & F'G & \stackrel{?}{\to} & ? \\ \scriptsize{\eta'F}\darr & \searrow & \darr\scriptsize{\eta'G} & {} & \darr\scriptsize{?} \\ G'F & \stackrel{G'\eta}{\to} & G'G & \stackrel{G'\epsilon}{\to} & G'H \\ \scriptsize{?}\darr & {} & \darr\scriptsize{\epsilon'G} & \searrow & \darr\scriptsize{\epsilon'H} \\ ? & \stackrel{?}{\to} & H'G & \stackrel{H'\epsilon}{\to} & H'H \\ \end{array}

The top left southeast arrow is ηη\eta'\ast\eta, and the bottom right one is ϵϵ\epsilon'\ast\epsilon. This diagram is commutative, since we just glued together two commutative squares, so any path from FFF'F to HHH'H equals (ϵϵ)(ηη)(\epsilon'\ast\epsilon)(\eta'\ast\eta).

Notice a pattern? Let’s fill in the missing parts.

FFFηFGFϵFHηFηGηHGFGηGGGϵGHϵFϵGϵHHFHηHGHϵHH \begin{array}{ccc} F'F & \stackrel{F'\eta}{\to} & F'G & \stackrel{F'\epsilon}{\to} & F'H \\ \scriptsize{\eta'F}\darr & \searrow & \darr\scriptsize{\eta'G} & \searrow & \darr\scriptsize{\eta'H} \\ G'F & \stackrel{G'\eta}{\to} & G'G & \stackrel{G'\epsilon}{\to} & G'H \\ \scriptsize{\epsilon'F}\darr & \searrow & \darr\scriptsize{\epsilon'G} & \searrow & \darr\scriptsize{\epsilon'H} \\ H'F & \stackrel{H'\eta}{\to} & H'G & \stackrel{H'\epsilon}{\to} & H'H \\ \end{array}

You can convince yourself bottom left and top right squares are just commutative naturality squares from the horizontal composition example above. Then, the entire diagram is commutative!

Let’s take a look at the composable arrows along the very edges of the diagram. What is the composition of the top edge, (Fϵ)(Fη)(F'\epsilon)(F'\eta)? Since vertical composition is given by the pointwise composition of the components, for any A ⁣:CA\colon C,

((Fϵ)(Fη))A=(Fϵ)A(Fη)A=(FϵA)(FηA)=F(ϵAηA)=(Fϵη)A \begin{aligned} ((F'\epsilon)(F'\eta))_A &=(F'\epsilon)_A(F'\eta)_A\\ &=(F'\epsilon_A)(F'\eta_A)\\ &=F'(\epsilon_A\eta_A)\\ &=(F'\epsilon\eta)_A \end{aligned}

with the third equality given because of the functoriality of FF', i.e. functors preserve composition of morphisms. Similarly, for the left edge,

((ϵF)(ηF))A=(ϵF)A(ηF)A=ϵFAηFA=(ϵη)FA=(ϵηF)A \begin{aligned} ((\epsilon'F)(\eta'F))_A &=(\epsilon'F)_A(\eta'F)_A\\ &=\epsilon'_{FA}\eta'_{FA}\\ &=(\epsilon'\eta')_{FA}\\ &=(\epsilon'\eta'F)_A\\ \end{aligned}

You should now feel safe composing the edges of the diagram above and the long diagonal, ignoring the other stuff, to get

FFFϵηFHϵηFϵηHHFHϵηHH \begin{array}{ccc} F'F & \stackrel{F'\epsilon\eta}{\to} & F'H \\ \scriptsize{\epsilon'\eta'F}\darr & {\searrow} & \darr\scriptsize{\epsilon'\eta'H} \\ H'F & \stackrel{H'\epsilon\eta}{\to} & H'H \\ \end{array}

The diagonal arrow is, as we learned earlier, the right-hand side (ϵϵ)(ηη)(\epsilon'\ast\epsilon)(\eta'\ast\eta) of the interchange law. However, this is another horizontal composition square, specifically for ϵηϵη\epsilon'\eta'\ast\epsilon\eta. Since the diagram is commutative, this proves the interchange law! 10

Again, while we can easily prove this in Cat\text{Cat}, the interchange law is required in the definition of any strict 2-category.

The interchange law simply means that if we have four 2-morphisms arranged in such a configuration as above, we can glue along the 1-morphisms then the 0-morphism or vice versa to get the same result.

3.6 More intuition

Hopefully you now have an intuitive understanding of natural transformations. Let’s try to think of them in different ways in an attempt to broaden and solidify this understanding.

First, let’s add another way to think about a functor to our tooklit. We can consider a functor F ⁣:CDF\colon C\to D is as a collection of objects in DD indexed by objects in CC, of course preserving the structure in CC. To put it another way, we can think of CC as a schematic or a pattern, and DD as where we realize and reify the schematic. Then FF tells us which parts of DD correspond to the blueprint of CC. In a sense, FF describes a “picture” of CC inside DD. In various areas of math, we might call something like this an “interpretation” or a “model.”

When we say F ⁣:CDF\colon C\to D indexes objects in DD by objects in CC, it should remind you of sequences. For example, a0,a1,a2,,ai,a_0,a_1,a_2,\ldots,a_i,\ldots is a collection of objects aia_i (in some set SS) which we can pick out with an index i ⁣:Ni\colon\N. Therefore, we can think of a sequence of objects in SS as a function a ⁣:NSa\colon\N\to S. Much how we think like a sequence “living” where the enumerated objects do, we can think about the functor FF as “living” in DD. We have to remember that functors also capture any structure between the “indices.”

From this perspective, what are natural transformations? If A ⁣:CA\colon C, and F,G ⁣:CDF,G\colon C\to D, a natural transformation η ⁣:FG\eta\colon F\Rarr G is a kind of correspondence between any two objects FA,GA ⁣:DFA,GA\colon D indexed by the same object AA, which captures the indexed relationships realized by FF inside of GG. In other words, it shows one realization FF of the schematic CC inside DD inside another realization GG, keeping corresponding parts in mind as a result of the index CC.


I think I need a break after all of those concepts, and you probably do too.

The following sections are unfinished parts of this post/series of posts, but hopefully you’ll be able to see where I’m going with this.

4 Some universal constructions

4.1 Initial and terminal objects

  • cover cones, mention universal property
  • mention “limit” and “colimit”
  • (co)limits, if they exist, are unique up to a unique isomorphism (“essentially unique”)
  • concrete examples from Set
  • list examples in Grp, Ab, Vect

4.2 Products and coproducts

  • same as above
  • cover biproduct, example in Ab, Vect

4.3 Equalizers and coequalizers

4.4 Pullbacks and pushouts

4.5 Limits and colimits

5 Monoids, monoidal categories and monoid objects

First of all, we need to understand the following concept. A monoid is:

  • a set MM
  • a binary operation  ⁣:M×MM\cdot\colon M\times M \to M (sometimes called multiplication) that sends two elements of the MM to a single element in MM
  • a particular element ee of MM called the identity.

As mentioned earlier, monoids can be viewed as categories with a single object. The elements correspond to morphisms, and the identity element corresponds to the identity morphism. Therefore, monoids are rather natural construction in category theory.

Again, groups are monoids where every morphism is an isomorphism, that is, the morphisms have inverses.

6 Cartesian closed categories and the internal hom

7 Monads

8 Comonads

9 The Writer

9.1 Writer monad

9.2 Writer comonad

10 The Reader

10.1 Reader monad

10.2 Reader comonad


  1. The burrito is a common but painfully inept analogy for monads. To use an analogy for the burrito analogy, the burrito analogy is like the rubber-sheet analogy for general relativity: somewhat intuitive, but it falls apart the moment you think about it. (What’s pulling stuff down?) ↩︎

  2. Then tear out the still-beating heart (problem), grasped in your bloodied hand (keyboard), whereupon you place it on the blood altar (category theory) in a gesture to the gods of mathematics and computer science (Euler, Gauss, Grothendieck, Turing, Lovelace, and Knuth). ↩︎

  3. For some categories, this may not be a set, either because you replace the objects in the hom-set with objects from another category, enriching it, or the category is large and you instead have a hom-class, where a class is simply a collection of things without all the baggage of set theory axioms. ↩︎

  4. To be precise, a concretizable category is one that can be made concrete with the choice of a faithful functor U ⁣:CSetU\colon C\to\text{Set}, called the forgetful functor, that “forgets” extra structure and reveals the underlying set. A concretizable category may be made concrete in many ways (with the choice of various forgetful functors), but there’s usually a nice way to do it. What’s a functor? A map between categories — I’m getting to it! ↩︎

  5. To be even more precise, it can be either all the morphisms into an object or out of it. This is formalized by the Yoneda lemma, which Wikipedia calls “arguably the most important result in category theory.” I’m still trying to fully comprehend it. ↩︎

  6. The reals can be considered a one-dimensional vector space over itself. ↩︎

  7. The notation xyx\mapsto y denotes what is essentially a lambda function. It accepts an argument, binds it to the symbol xx, and returns the expression yy where xx is replaced with the value bound to xx. By convention, it associates to the right: a(bc)=abca\mapsto (b\mapsto c)=a\mapsto b\mapsto c. ↩︎

  8. Seriously, just try labeling each symbol with a type and see what happens! ↩︎

  9. Their inverses are also mapped, and since the multiplicative unit 11 is preserved by ff, they stay inverses, guaranteeing that units are mapped to units. However, non-units may be mapped to units. ↩︎

  10. I actually came up with the proof in this section by myself while trying to prove the interchange law. I’m pretty sure this is the default, obvious proof, but I’m still proud of it. ↩︎