Data Is Debt Too

It’s obvious to most software engineers that code can be (or is always) tech debt, but it took me some time to understand how much data1 can be debt too. In fact, data is far more cumbersome to manage than code.

Software engineers deal in code and data: code is the logic that does something useful with data. Some engineers maintain software whose data is relatively opaque (say, nginx or git), but most SaaS or internal corporate software engineers are tightly bound to the data that makes their applications work. They are responsible for its structure and integrity over time.

Pound for pound, data is harder to work with than code:

These points are all different ways of saying the same thing, but it’s worth hammering home the permanence of data relative to the ephemerality of code. But all that being said, the permanence of data makes it more powerful than code. Create data, ingest data, maintain data - but treat it with respect. Creating a new application with existing data is simply less costly than creating or ingesting new data.

  1. For this essay, we’ll exclude user data for which privacy is a concern from the conversation. While that sort of data should be (and increasingly is) considered a liability as much as it is an asset, privacy is not the focus of this essay. ↩︎