Applying Don't Repeat Yourself
Some developers believe DRY is solely about removing code duplication. I'll show why that's false and give you a method to apply the principle more effectively that will also save you work.
Practical everyday advice
If you're looking at two or more pieces of code and wondering if you should refactor them in the name of DRY, ask yourself the following question before doing anything: should these pieces of code be able to change independently?
If the answer is "yes," do nothing. If the answer is "I don't know," do nothing, because there isn't enough information. Wait and see how the code evolves to gather more information. The only time to consider making changes is when you've answered "no," and in my experience these times tend to stand out.
Getting in the habit of asking yourself this question first will save you a lot of work.
Going deeper: code duplication vs knowledge duplication
To understand DRY properly, it's important to see the distinction between code duplication and knowledge duplication. Here is a Venn diagram showing their relationship:
We can see three cases.
Case one is duplicate code that doesn't duplicate knowledge. This happens when two pieces of unrelated code just happen to look the same. If we think about it, the shorter two pieces of code are the more likely they are to look the same. This means the newer a piece of code is, the more likely it is to appear to be the duplicate of something else, because it's more likely to be short. When it comes to DRY, this characteristic is one to pay attention to. Refactoring too early can happen because we're looking at things at the beginning of their evolution when they're more likely to coincidentally look the same.
Case two is duplicate knowledge that isn't duplicate code. The easiest way to picture this is by imagining two different implementations / representations of the same functionality. I'll show a small real-world example of this later on.
Case three is where duplicated code also duplicates knowledge. This is the overlapping section in the diagram above.
DRY is concerned with cases two and three – the duplication of knowledge – and yet, more often than not, I see people applying it to case one to remove code duplication. Not convinced? Time for some history.
The origin of DRY
DRY was introduced to the world by Andy Hunt and Dave Thomas in their book The Pragmatic Programmer, released in 1999. In Chapter 2 of the book, there is a section titled The Evils of Duplication containing the following passages defining DRY:
As programmers, we collect, organize, maintain, and harness knowledge. We document knowledge in specifications, we make it come alive in running code, and we use it to provide the checks needed during testing.
...
We feel that the only way to develop software reliably, and to make our developments easier to understand and maintain, is to follow what we call the DRY principle:
Every piece of knowledge must have a single, unambiguous, authoritative representation within a system. Why do we call it DRY?
DRY—Don't Repeat Yourself
Including the passages I've left out to save space, the word knowledge occurs seven times. In contrast, the word code only appears once. The authors go on to say:
You'll find the DRY principle popping up time and time again throughout this book, often in contexts that have nothing to do with coding. We feel that it is one of the most important tools in the Pragmatic Programmer's tool box.
That paragraph directly contradicts the interpretation of DRY claiming the principle's whole purpose is to remove duplicate code.
A general example
Suppose we have the following:
function A() {
task1();
task2();
task3();
}
function B() {
task1();
task2();
task3();
}
What would you do here? Does looking at A and B make you feel they're related somehow? Why? We can see they each perform three tasks, so shouldn't we do something like this?
function A() {
runCommonTasks();
}
function B() {
runCommonTasks();
}
function runCommonTasks() {
task1();
task2();
task3();
}
Or perhaps we could do this instead:
function AB() {
task1();
task2();
task3();
}
All callers that invoked either A or B would need to be modified to call AB, which on its own is potentially a lot of work. If we're lucky, then A and B were the same thing all along, so our change is fine. But if we've guessed incorrectly, then over time we might see this sort of logic show up:
function AB() {
task1();
task2();
task3();
if (A) {
// do something
}
}
When we encounter this, it indicates A and B were never the same thing and AB is not an appropriate abstraction to use. Undoing this change in this example is straightforward since we're only dealing with about 10 lines of code. But that tends not to be the situation with modern code bases, where even startups can accumulate millions of lines of production code. Reversing a change like this in reality can be expensive to do, and hard to get buy-in for. This is the part that is difficult to convey in an article since it's hard to show how a code base evolves over time and what effects result from that evolution.
Ultimately, the problem is we don't have enough information to determine whether A and B are the same. We need to get more information by seeing how they evolve over time. By prematurely introducing runCommonTasks or combining A and B into AB, all we've done is glued A and B together, despite the fact they may need to change independently. We can argue that either of these changes in isolation isn't too harmful. For runCommonTasks, I'd say that's largely true. But the AB change does tend to be harmful when applied too early like this.
Every time code duplication is removed just for the sake of it, it's almost always replaced with (at least) one more coupling. This is why wielding DRY solely to remove code duplication can be insidious: it seems pretty innocuous at the time. But the couplings add up. The more coupled a system is, the harder it becomes to change it overall. Even a small team of developers can add a large number of unnecessary couplings in a relatively short amount of time from misapplying DRY.
So what would I do about A and B? In case it wasn't apparent, I deliberately named them A and B so we can't infer anything about them. For that reason, there isn't enough information to tell if they should be able to change independently or not, so I'd do nothing. Let them evolve over time and see what direction they each take.
If you're not sure, it's perfectly fine to leave code like this alone. Don't feel compelled to make a change.
A real-world example
A company I worked at produces real-time auction software. One large piece of work involved two teams being tasked with building out functionality for controlling whether a user was allowed to bid on a given auction. This functionality touches on all major parts of the system: the UI, bid placement, real-time updates, notifications (for staying informed of upcoming auctions a user might be interested in), and so on.
At some point, I came across these two enums in the C# backend:
public enum BidderApprovalStatus
{
Accepted,
Pending,
Rejected
}
public enum ApprovedBidderStatus
{
Confirmed,
Undecided,
Declined
}
What interested me was how someone else had come across this before I did and had opted to create a mapping service between the two types, rather than raising an issue to eliminate one of them. That service also had an interface and a separate module for wiring it up to the dependency injection container. And, of course, there was the problem that some parts of the backend used one enum and other parts used the other one – this was the reason the mapper showed up, because it was cheaper to add that than to fix the root cause.
Putting the unfortunate inter-team communication to one side, notice how these enums represent the same knowledge. Furthermore, notice how there is technically no code duplication here at all – the type names differ, and the three options also differ. If we incorrectly assume DRY is only concerned with removing code duplication, then we wouldn't apply the principle in this situation. But it's obvious only one of these enums should exist.
The removal of one of the enums led to the removal of the mapper and everything else that came with it. This effect is typical of what happens when applying DRY to remove duplicate knowledge.
Conclusion
The beating heart of DRY for me is this: when applied well, it doesn't add weight to a system (couplings), it takes weight away (redundant representations and everything that sprang up around them). It exists to make systems lighter and easier to change. Removing duplicate code isn't the goal. Removing duplicate knowledge is.