Data Is Debt Too
It’s obvious to most software engineers that code can be (or is always) tech debt, but it took me some time to understand how much data1 can be debt too. In fact, data is far more cumbersome to manage than code.
Software engineers deal in code and data: code is the logic that does something useful with data. Some engineers maintain software whose data is relatively opaque (say, nginx or git), but most SaaS or internal corporate software engineers are tightly bound to the data that makes their applications work. They are responsible for its structure and integrity over time.
Pound for pound, data is harder to work with than code:
- Code is easy to change, but data is not. Changing code is as easy as recompiling. Changing data requires
- As a result of the above, data lives far longer than code. Probably, your business will never willing give up data, and the responsibility of keeping your data useful (i.e. making sure code can still do useful things with it) falls entirely on the engineer.
- Code is easier to refactor than data. You can start from zero and recreate functionality with code, but with data you have to migrate what you already have. You probably can’t do a “data rewrite”.
- Code bugs are ephemeral, data bugs are forever. A bug exists at a moment in time - the software doesn’t behave the way the user expects, but often it is fleeting or the user can work around it. But incorrect or invalid data stays that way forever.
- Code has a clear history, data rarely does. Thanks to version control systems, the history of code is never in doubt. But depending on the architecture of an application, the history of data may be difficult or impossible to glean, which makes it challenging to understand how the data exists in its current form.
These points are all different ways of saying the same thing, but it’s worth hammering home the permanence of data relative to the ephemerality of code. But all that being said, the permanence of data makes it more powerful than code. Create data, ingest data, maintain data - but treat it with respect. Creating a new application with existing data is simply less costly than creating or ingesting new data.
-
For this essay, we’ll exclude user data for which privacy is a concern from the conversation. While that sort of data should be (and increasingly is) considered a liability as much as it is an asset, privacy is not the focus of this essay. ↩︎