The Git of Theseus
Saturday, March 10, 2018

Introduction

In case you haven’t heard of the Ship of Theseus, here’s a quick summary: A ship sails out from the port of Theseus with some sailors and supplies. But as it journeys throughout the world, it is repaired, planks being swapped with fresh ones. Food is consumed and replaced with new rations. Even crew members leave, and some new ones join the voyage. By the time it returns to Theseus, not a single original plank or crew member remains. So the question is, is it still the same ship?

The Problem of Identity

To me, this question seems to have two obvious, but opposite answers. If you’re looking at it literally, then no, it’s no longer the same entity. Everything about it is different! However, I would still call it “Theseus”, and treat it as if it were the same ship. So in that respect, it is the same entity.

Perhaps we can make the question more clear. Perhaps one may argue that a ship has not been changed in a meaningful way during its life. Of course, it may get minor upgrades, but it remains fundamentally the same entity. However, the same can not be said of humans. People undergo drastic changes throughout their lives. When you’re only a year old, you may not know how to walk. By the time you retire, you will have mastered a trade, cultivated numerous skills, and have developed a personal insight into how the world is. These two individuals are definitely not the same, but we still refer to them using the same label. To add further confusion, the label we use may not even be the same, such as when a person gets a name change. Yet even if both the person’s physical constitution and label change, we still treat them as if they were the same person.

Git

Perhaps identity comes from a shared history. We can’t say that any two arbitrary people are the same unless they have an unbroken series of shared changes from past to the present. Likewise, the ship that left Theseus may have changed, but did so in a series of steps that connects the beginning to the end of the journey.

Let’s take a break from that for a second and consider Git, a source control system designed by Linus Torvalds. To represent modifications to tracked files, every change, no matter how small, is stored as a commit, which is represented by a SHA1 hash, such as e83c516. So you essentially have a long series of changes, each based on the previous one. The histories can be much more complicated than a person’s linear life experience. You can have two entirely separate branches of development be based on a given commit. But, like any person or seaworthy vessel, it is different at each point in time, that is, at each commit.

How do humans usually use Git? They develop a project, incrementally making sets of changes, which they then merge into the main branch. But as certain features are added or old functions are removed, this can change the essence of their creation. It may have the same name, but like the Ship of Theseus, some modification or sets of changes may result in some aspect of its “essence” being changed. A sailor on the aforementioned ship may feel that the environment has changed considerably under new leadership, for example. Likewise, a particular set of changes may require a different means of using the program, or cause it to look significantly different from a previous version.

Labelling

Which brings us to version numbers. Generally, software is numbered so that its version increases in some way meaningful to humans, such as semantic versioning. But what people think of a particular version is usually not very relevant. People may not even know what version a particular program on their computer is at. Or, a significant change could come in between versions. What is important about versioning are not the numbers themselves, but rather the shared language it provides to people discussing a particular entity.

Why is labelling important to the concept of identity? Because we use this all the time to distinguish entities from each other. Many cultures celebrate birthdays, which are little more than a monotonically increasing “version number” of sorts. It is used as a meter stick to understand how a person has grown throughout their life, and we often make assumptions about people based on their age. This is despite the fact that the particular day we increment the number on is not necessarily special. That is, life events with significant impact on one’s life are note more likely to occur on the day of one’s birth. And we often find that if a more specific timeframe is necessary, we use it. “It happened two months before I turned 24”.

But ultimately these “version” labels are only meaningful because they allow us to separate an entity’s “being” into several overlapping groups, permitting one to choose which category they find useful for a particular task. For instance, in all likelihood, there were no big changes in your physical appearance or mental maturity between when you were 17½ and 18. But this label is very important legally, as in the first situation you are a minor, unable to vote, agree to contracts or purchase cigarettes. Likewise, imagine you are more charitable and generous now than in the previous year. You may even feel that you are a “new person”. But when it comes to taxes, or employment, or whether you are married to somebody, you are the same person both before and after. Even in mundane cases, where a piece of software will not run because your copy of somelibrary is only version 2.1, but 2.3 or later is demanded. So even though somelibrary 2.1 and somelibrary 2.3 are recognized as different in this context, we still call both somelibrary in others.

Conclusion

The classical definition of identity is “the relation each thing bears only to itself”. However this can quickly fall apart if you try and use it literally. Myself now and myself ten seconds ago have many things in common, but the placements and trajectories of the many molecules and structures in my body are different, and therefore we are not the same person. It seems that the only entity which has a relation to itself is itself only at that exact point in time.

So whether it’s a different commit hash, or configuration of atoms that make up humans or ships or planetary bodies, what we care about when we seek to determine identity is not a perfect relation between two entities, but rather to extract the properties we find useful and relevant. Perhaps the concept of “identity” is just as much a human invention as Git or maritime voyages.