jducoeur | The First Law of Programming, Part 1: Duplicate Code is Evil

So my rant yesterday about good and bad programmers did leave me musing about an important corollary question: how do you make good programmers? The answer is obviously complex, but here's a starting point: teach them The First Law of Programming, which is:

Duplication Is Evil

Really, that's it -- one nice simple sentence, with huge ramifications.

The odd part is how ill-taught this rule is. Most programming courses teach it as an afterthought, if at all, which is strange because it motivates so much of the structure of programming. I mean, the evolution of computer languages has mostly been about finding higher and higher-level ways to eliminate duplication in code, and many language features are all about ways to remove duplication. For example:

If you find the same expression being used in multiple places in the program -- even if it is just one complex line -- it most often makes sense to lift that out into its own parameterized function or method.

If you have the same basic functional pattern being used repeatedly -- that is, when you can comfortably say, "This is just doing the same thing as that except for X" -- then you probably want to lift out a higher-order function, encapsulating X as a functional parameter or in a closure.

If you have multiple classes that are doing essentially the same things, except *to* different types -- for instance, a List of integers vs. a List of strings vs. a List of Customers -- then you almost certainly want a Generic class.

If you have multiple classes that are trying to do the same tidbit of functionality, then you probably want a trait or a mixin. (Or if you are trapped in single-inheritance land, at least change the way you're aggregating those functions.)

And so on. While not every programming-language feature is about removing duplication, many are, and for good reason.

Mind, I am not advocating removing duplication for the usual squishy reasons like "reuse". (Itself a source of many sins, because it misses the fact that *sometimes*, it really is much cheaper, easier and more reliable to reinvent the wheel.) The real reason is much simpler: Duplication Causes Bugs. Period. And I don't mean occasionally: in my experience, *most* serious programming bugs trace back to duplication in one way or another. Sometimes it is because duplicated code makes the code bulkier and harder to reason about. Frequently, it is because you copied this code into four places, tweaked it in one of them, and forgot to tweak it in the others. Most often, the duplication is simply a symptom of the fact that you don't really understand the abstractions in your code.

So if you are learning programming, I commend to you this rule. Whenver you notice *any* kind of duplication, ask why, and really dig into whether those duplicates should be combined. While it is technically possible to carry it too far, it's really pretty difficult to do so -- the exceptions are at least a bit unusual. And continually saying to yourself, "Surely there must be *some* way to remove this duplication" will force you to think in ways that will teach you a huge amount about why modern programming languages work the way they do, and why you want to use those fancy constructs.

To give you a leg up, I'll point you specifically to the little-known bible of programming: Refactoring: Improving the Design of Existing Code, by Martin Fowler. Fowler can be a bit of a loon (albeit glorious fun to listen to), but he's a brilliant loon and one of the more insightful thinkers about the art of programming. This book, in particular, is the one I hand to *every* intermediate-level engineer. It starts with a fairly modest section on how to think about the structure of code, and then spends the rest of the book on an encyclopedia of "code smells" and how to fix them. Tom Leonard insisted that I read it back when I was working for him, a dozen or so years ago, and of all the things I learned from Tom, this was probably the single most valuable. It isn't quite perfect -- it is very Java-centric, so misses lots of functional-programming options available in more modern languages, and it is very focused on fixing existing code. But it'll teach you a lot about how to *think* about code properly.

As the title says, this is just part 1. When I have some time (possibly later today, but we'll see), I'll get into Part 2: Duplicate Data is Evil...

Flat | Top-Level Comments Only

That was strongly in mind just last night, while signing 120+ pages of paperwork for a refinance.

Things for me to look forward to. (My closing is scheduled for Tuesday. Pain in the ass, but the savings will be very nice.)

I will admit to being slightly surprised that you're refinancing this soon after the purchase, though -- significant interest rate drop since you bought?

"Too far" may be rare, but "badly" is also a pitfall of the overzealous.

True. Fortunately, Fowler's book also lists this sort of thing among its code smells, and recommends ways to fix it. (I'm not really getting into cohesion as a principle here, but it's implicitly crucial in the Refactoring book.)

I'd call that a point in its favor

Oh, sure. I suppose my point is better expressed that, while this is focused on fixing existing code, much of it is highly applicable to writing new stuff as well...

That was strongly in mind just last night, while signing 120+ pages of paperwork for a refinance.

Things for me to look forward to. (My closing is scheduled for Tuesday. Pain in the ass, but the savings will be very nice.)

My closing was scheduled for Oct. 17, but one business day earlier they discovered a problem I had told them about two months earlier, so they rescheduled it for Oct. 27. On Oct. 27, two hours before closing, I was told that they hadn't found a solution to the aforementioned problem, and closing was therefore canceled.

But to get back to your point... the previous time I tried to refinance, the deal fell through precisely because of Evil Duplication. Two mortgage company employees each had half of my dossier of paperwork, and each was waiting for me to send in the other half before they could proceed. If the responsibility had been in a single point of control, this would have been discovered and fixed much sooner.

I will admit to being slightly surprised that you're refinancing this soon after the purchase, though -- significant interest rate drop since you bought?

Not large, but not trivial: 3/8 of a percent lower, paying very little cash to do so. Naive payback time (ignoring mortgage interest deduction, future value of money, etc) is about 2-3 years, and we're planning on being here at least 10, if not 15-20, so the math makes sense.

(It could have been even better to pay points and get an even more ridiculously good rate, but we're hurting for liquidity after the renovations, and - strangely - at the time we refinanced, points wouldn't have lowered the rate a huge amount, making the marginal benefit slim.)

The First Law of Programming, Part 1: Duplicate Code is Evil

no subject

no subject

no subject