The Weight of Small Decisions
On the accumulation of architectural choices in ML systems — and how the sum of individually reasonable decisions can produce unreasonable outcomes.
There is a kind of technical debt that doesn’t show up on the balance sheet. It accumulates not through bad decisions but through individually reasonable ones, each made under local constraints, each defensible in isolation, each adding a small weight to a structure that is slowly becoming something other than what was intended.
I have been thinking about this in the context of ML systems — not the large architectural decisions, which are usually made deliberately and documented, but the small ones. The decision to use a particular tokenisation scheme because it was already installed. The decision to log predictions to a database table that was available rather than one that was designed for the purpose. The decision to use a slightly different definition of “positive example” in the evaluation set than in the training set, because the evaluation set was assembled later and the exact definition had drifted in the interim.
None of these is a mistake, exactly. Each makes sense given the constraints and knowledge available at the time. But the accumulated structure — the ML system as it actually exists after eighteen months of development — is the product of hundreds of such decisions, and it is not the product anyone would have designed intentionally.
The Interpretive Burden
What makes this particularly interesting to me is the way it distributes interpretive burden. Every ML system is simultaneously a technical object and a record of decisions. To understand what a system is doing and why requires understanding not just the architecture and the weights, but the history — the sequence of choices that produced the current state. In complex systems, this history is rarely fully documented, and even when it is, the documentation doesn’t capture the full context of each decision.
The problem is compounded by the fact that small decisions frequently interact. The tokenisation scheme affects what the model learns about morphology. The evaluation set definition affects which failure modes are visible and which are not. The logging format affects which analyses are easy and which require additional work. The interactions are often invisible until something fails.
A Practice of Accumulation
I don’t think there is a solution to this, exactly. Design for explicitness helps — making small decisions visibly rather than implicitly. Documentation of intent, not just implementation, helps. Periodic audits of accumulated drift help.
But some of the weight is irreducible. Complex systems are historical objects, and their behaviour at any given moment is shaped by decisions made under constraints that no longer apply, in contexts that no longer exist. The accumulated weight of small decisions is, in the end, what the system is. Learning to read that weight — to understand what it says about the choices made and the constraints faced — feels like one of the more important and underappreciated skills in applied ML work.