jducoeur: (Default)
[personal profile] jducoeur
Another day, another networking event -- I'm slowly getting used to going to all these Boston Tech Meetups and such, to meet people, talk up Querki and start to understand how one gets an investment.

Along the way, I'm chatting with lots of folks, and a remarkably large fraction lead off with, "Well, I've always been doing X, but I want to learn to code". (Last night's was a fellow who does financial compliance work for one of the large funds.) These folks are usually self-taught, and tend to be very self-deprecating about the fact that they didn't go to school so they don't *really* understand programming. A couple of the programmers I was with and I got chatting about that, and the fact that, yes, the best way to learn to program is by doing. A degree in CS is helpful, but mostly in that it teaches you some of the underlying theory for programming *well*; the nuts and bolts change so often that the details you learn in school will only be useful for a limited time anyway. Somewhere in there, I asserted that you could probably list all of the most-useful bits of theory and practice in one brief talk anyway.

So, here's a challenge: help me figure out what those are. What are the key engineering principles that *every* programmer should know, that probably aren't obvious to a newbie and which aren't necessarily going to be taught in an online "How to Java" class?


I'll start out with a few offhand:

Refactoring: great code doesn't usually come from a Beautiful Crystalline Vision that some programmer dreams up -- it comes from writing some code, getting it working, and then rearranging it to make the code *better* while it's still working. That's "refactoring": the art of making the code cleaner without changing what it's doing. It's a good habit to get into, especially because it takes practice. (Granted, listing all the major refactoring techniques is a good-sized talk itself; I highly recommend Fowler's book on the subject.)

The DRY (Don't Repeat Yourself) Principle: which I usually describe as "Duplication is the source of all evil". Any time you are duplicating code, you're making it much more likely that you'll get bugs when things change. Much of refactoring is about merging things to eliminate duplication. Similarly, duplicate data is prone to getting out of sync and causing problems, so you should usually try to point to the same data when it's convenient to do so.

Efficiency is good, but algorithmic complexity is what matters: this is what's often called "Big-O" notation in computer science. How fast things run *does* matter, but only in the grand scheme of things. Whether this approach takes twice as long as that one probably doesn't matter unless you're doing it a bazillion times per second. What *does* tend to matter, given a list of size n, is whether you're going through it just once -- O(n) in the notation -- or whether each time through you're going through the whole list again -- O(n^2) in the notation, that is, "n-squared". (You'd be surprised how easy it to to wind up with algorithms that are n^2 or even n^3 -- that can actually get slow.) Or, if you have two list m and n, does your approach take O(n+m) time, or O(n*m)? It's worth practicing thinking through these order-of-magnitude evaluations and getting an intuition for it. That said...

Big stuff swamps small stuff: in one community the other day, I pointed out an approach to solving a problem that involved creating an extra object for each HTTP call. One of the folks in the discussion asked whether that inefficiency would matter, and I had to point out that you're already handling an HTTP call -- at *best*, the overhead of that handler is at least 1000 times that extra object creation, quite likely 10000 times more, so this is a drop in the bucket. So keep scale in mind, and don't sweat the small stuff. If you know your list is never going to have more than ten entries, even O(n^3) probably doesn't matter much.


What else? Can we craft a reasonably brief Rosetta Stone that summarizes the *common* stuff that every programmer should know, so they know what to look for? What are the principles that are true regardless of programming language, which aren't necessarily taught by the average JavaScript bootcamp? DRY is the heart and soul of good programming IMO -- are there other principles of similar importance?

(no subject)

Date: 2016-05-11 12:38 pm (UTC)
dsrtao: dsr as a LEGO minifig (Default)
From: [personal profile] dsrtao

Every program is an attempt to capture and automate a decision-making process. If you don't fully understand the specific process you are working on, nothing else matters.

(no subject)

Date: 2016-05-11 01:14 pm (UTC)
From: [identity profile] goldsquare.livejournal.com
My initial take on this, and your comments reinforce that it is critical to get it "right enough now, better later".

Not refactoring. But design and then code in such a way that you are humble about your assumptions. Don't implement anything "in case I might need it", but implement so that your base ideas can expand.

My second thought is that a good understanding of "software as contract" when building an API is essential, although most early programmers are building programs, not APIs.

One of my former co-workers at SUN defined for me the difference between one system being more powerful than another. If you could implement A by using B, but not implement B using A, B is more powerful. Design for eventual power.

With my professional software testing hat on, "design for error". Generally speaking, more of people's code is built to detect or respond to error than to do the actual task - and if not, the result is often too brittle to use.

(no subject)

Date: 2016-05-11 01:09 pm (UTC)
From: [identity profile] goldsquare.livejournal.com
This is very true... and it leads to a point I'll eventually make below.

(no subject)

Date: 2016-05-11 01:39 pm (UTC)
dsrtao: dsr as a LEGO minifig (Default)
From: [personal profile] dsrtao
The difference between a trivial project and a serious project is that on every serious project, maintenance and improvements take much, much longer than the initial development phase.

Corollary: anything "clever" or "elegant" needs more commentary, not less.

(no subject)

Date: 2016-05-11 02:54 pm (UTC)
From: [identity profile] metahacker.livejournal.com
SOLID, to add to the list you have above.

Concurrency: different paradigms, and when one is better than another.

More philosophically:

That programming is, at essence, struggling with the finite ability of the human brain to understand things. Most other principles fall out from this.

The need to balance between subsets of readability, maintainability, compactness, extensibility, execution speed, scalability, safety, data safety, latency, testability, etc. Often multiple axes can be improved simultaneously, but sometimes not, and sometimes it's very hard to figure out how.

The bane of premature or naive optimization of these. Often, coders are taught to always optimize for one (e.g. algorithmic complexity) even when that doesn't matter (e.g. because the domain is too small for O(n lg n) to be worse than O(n lg lg n), and what matters much more is either the big data structure storing that thing, or the density of the code to maintain it).

The inability for a realistic algorithm to cover all of a combinatorically-complicated world. CS-as-taught shows you only toy problems, and pretends that you can prove correctness. (And you can--in limited cases.)

The difference between essential and inessential complexity, especially in the face of the above.

The different demands of code depending on its purpose: data manipulation, user interaction, data management, etc. Too often coders believe the code they write is the only type (net admins think all code is about routing, front-end folks thing all code is about dispatching events, etc.) and that therefore their paradigm is the only valid one.

That programs are necessarily created in service of humans. In the end, some human will need to see benefit, or you don't get paid.

(no subject)

Date: 2016-05-11 02:55 pm (UTC)
laurion: (Default)
From: [personal profile] laurion
I'd also point to even more fundamentals that they'll be exposed to once they start learning, but should be stressed as places to pay attention to things:

Comment/document. It is possible to go overboard here, but at least have a written sense of what each function is doing for you.

Recursion. Or more generally, strategies for smartly breaking a problem down into smaller bits. Honestly, this is more to do with understanding your logic more than anything else.

Testing. Ways of looking into what is happening. Dump lots of stuff to stdout now, clean it up when everything is working (and perhaps commented).

(no subject)

Date: 2016-05-11 04:06 pm (UTC)
mneme: (Default)
From: [personal profile] mneme
Re recursion: Recursion itself is just a technique, of course, but yes, absolutely. I'd express this as "all programming is a method of taking a difficult problem and redefining it into a set of smaller, easier problems. Keep going until the smaller, easier problems are things the computer already knows how to solve; once you've done that, you're done.

(no subject)

Date: 2016-05-11 04:04 pm (UTC)
mneme: (Default)
From: [personal profile] mneme
Hmm.

Start somewhere: People will talk about top down or bottom up; they'll talk about test-driven development, about comments-first development; it doesn't matter as long as you end up with everything you need eventually. The most important thing in order to end up with working code is to have code--and the most important thing there is to start. Write whatever is easiest, when you hit a stopping/changing point, go on to the next thing, whether that's tests or comments or the next piece of code. The hardest place to code from is often a blank file (this ties nicely into refactoring, since the code even after it functions might not be super-efficient, and that's ok).

Computer time is cheaper than programmer time: Efficiency can and often will matter, but for far too many things, it's cheaper -- in terms of effort, and also if money is involved money, to write things in ways that are easier and faster for the programmer than it is to do things that are easier/more efficient on the iron. Sure, you might need that extra bit of speed, but chances are, you wont.

You Won't Need it: That brings us to the great dictum of XP: You Won't need it. Try to favor working code over coding abstractions that you don't need yet, pretty much always. Sure, if an abstraction is easy and the natural place to go next, you can code it in advance of direct necessity. But it's much easier to take working code and refactor it to the more perfect abstraction than to build the abstraction and have it complicate and obfuscate your code long after it becomes obvious that you'll never use it.

ETA: Fail fast. In general, you don't want to code to check assumptions and cover every possible case at every point; that way lies madness. Instead, rule out the impossible and incorrect cases as early as possible in a given branch, then write the rest of the code assuming the inputs are appropriately correct. That way you're dealing with the bad cases in as few places as possible. (Exceptions and exception handling are a special case of fail fast, as using them avoids having to have lots of code everywhere checking for and passing up errors).
Edited Date: 2016-05-11 04:09 pm (UTC)

(no subject)

Date: 2016-05-11 04:47 pm (UTC)
From: [identity profile] metageek.livejournal.com
Computer time is cheaper than programmer time
And you should have good tools for telling you when it isn't.

(no subject)

Date: 2016-05-12 03:52 pm (UTC)
mneme: (Default)
From: [personal profile] mneme
*nod* This aspect of "You won't need it" is tricky, partially because it depends heavily on what your "project" actually is. If what you're fundamentally designing is a language, then so you have it. But if you get tangled in the weeds, you'll never be able to spend enough time on gardening.

(no subject)

Date: 2016-05-11 04:52 pm (UTC)
From: [identity profile] metageek.livejournal.com
APIs are UIs. A programmer designing an API should be familiar with, say, The Design Of Everyday Things (maybe there are better books for this by now?), and design the API to be a comprehensible tool.

(no subject)

Date: 2016-05-12 10:35 am (UTC)
From: [identity profile] hudebnik.livejournal.com
Another way to word it: User-friendly interfaces are important. The vast majority of code isn't intended to be used by a human, but by other code, so it needs to be friendly to that user.
Edited Date: 2016-05-12 11:04 am (UTC)

(no subject)

Date: 2016-05-12 10:50 am (UTC)
From: [identity profile] hudebnik.livejournal.com
Test-driven development is awfully useful, even at the absolute beginning level. A large fraction of absolute-beginner errors amount to "failure to understand the requirements," and writing tests is a good way to uncover such misunderstandings before you've wasted a bunch of time writing code. It also helps to uncover unfriendly API's -- if I keep stumbling over the order of arguments when I write tests for this function, other callers probably will too.

Tests need to be FAIR: fast, automatic, independent, and reliable. Fast and automatic, because if tests are too time-consuming or too much of a pain to run, you won’t actually run them, and they’ll do you no good. Independent, because if side effects of one test can change the outcome of other tests, it enlarges the debugging space exponentially (and yes, I mean that literally). Reliable, because if a test sometimes fails for reasons having nothing to do with the program being tested, people lose faith in it and it does you no good.

It's a whole lot easier (both simplicity of code and independence of tests) to write tests for functions whose inputs and outputs are explicit -- ideally, parameters and return values -- than for those that depend on global state, hidden state, or I/O, and/or produce results in global state, hidden state, or I/O. Which leads to the corollary "segregate interesting processing from I/O."
Edited Date: 2016-05-12 11:03 am (UTC)

mocks, etc.

Date: 2016-05-13 03:22 am (UTC)
From: [identity profile] hudebnik.livejournal.com
I'm embarrassed to admit that I had never heard of "mocking" until maybe three years ago, and didn't "get" it until well after I started at Google. I think of it as "how do you write tests for your code's interaction with slow or unreliable external systems?" You could try testing your program's resilience against network failures by pulling the plug on the network router while the program is running, but that's just about the opposite of FAIR. So instead you build something that LOOKS like a network router, as far as your program is concerned, but it's actually software and can be pre-programmed to fail in specified ways at specified times.

Which means, in turn, that your software has to be parameterized by the LooksLikeANetworkRouter interface: in normal operation you give it a real network router, while for certain kinds of testing you give it a MockNetworkRouter object that can be told how and when to fail.

For another example, my project has lots of code that uses real-world timestamps. It's notoriously difficult to write unit tests involving real-world timestamps, because the real world insists on moving forward in time from one test run to the next If you rely on a particular section of code taking at least or at most a specified period of time, you've lost the "R for reliable"; if you rely on Sleep(num_microseconds) calls, you've lost the "F for fast". The solution is to parameterize the program with an AbstractClock, one implementation of which is the real system clock, and another implementation is a MockClock that can be read, set, advanced by various amounts, etc.

Which brings us to another rule of thumb that I thought somebody had already mentioned in this thread but I don't see now: any object you get from outside your code (whether passed in as a parameter, returned by a factory method, etc.) should be used according to its interface, not its implementation. For an extreme example, the "factorial" function really should have a parameter of type LooksLikeANaturalNumber, as long as that interface has IsZero, Predecessor, and Multiply functions.

And which reminds me of another rule of testing: for every test, ask yourself where a bug would have to be in order for this test to expose it. If you already have tests that would expose a bug in this specific place, you probably don't need another; conversely, if you have large sections of code in which a bug could hide without any of your tests exposing it, you need more tests. Mocks are good for detecting bugs in the parts of your code that directly interface with the external system that you're mocking.
Edited Date: 2016-05-13 03:29 am (UTC)

Resiliency trumps simplistic notions of function

Date: 2016-05-13 11:34 am (UTC)
From: [identity profile] fernando salazar (from livejournal.com)
Back in the day, we cared about minimizing use of resources and maximizing functionality. But in today's world, RESILIENCY trumps all -- if you don't got it, no one will use your supposedly great features.

Example: Not too long ago my team was devising a new server-side message logging function. I gave the developers direction to save off messages to intermediate SAN storage -- a pretty reliable thing -- and then have a separate background task commit the messages to the destination DB. The reply: Oh no, we can't have people wait to have their messages appear in the log. This was foolish. If the DB was down for maintenance, the entire system would be unusable!

That thinking comes from the old, "enterprise software" way of thinking, where typically everything goes down for maintenance all at one time. A modern cloud system however, needs to maintain as much uptime for as many features as possible, and further needs to assume any dependent system component might be unavailable at any time. By using an intermediate storage approach, the messaging app could still function, even if the persistent store was offline. The actual delay anyone would realize in seeing logged messages would be minuscule anyway, like 5 seconds at worst.

(no subject)

Date: 2016-05-14 08:09 pm (UTC)
From: [identity profile] momentsmusicaux.livejournal.com
At least two people will work on your code: you, and you in 6 months' time. And you in 6 months' time won't remember anything about how it works. So write documentation for everything, and be generous with inline comments.

(no subject)

Date: 2016-05-15 07:09 am (UTC)
From: [identity profile] momentsmusicaux.livejournal.com
Not by me originally, but I don't recall where I first read it.

(no subject)

Date: 2016-05-15 08:22 am (UTC)
From: [identity profile] momentsmusicaux.livejournal.com
Avoid complex expressions, by breaking them down and assigning variables.

So instead of:

if ($foo == 'bar' || (count($biz) > 2 && $biz[2] = 'bax'))

break down to:

$second_biz_is_bax = count($biz) > 2 && $biz[2] = 'bax';
if ($foo == 'bar' || $second_biz_is_bax) {}

The main reason is it's easier to read. I like to state this as 'I'm not a computer -- the computer is a computer' -- having to parse long expressions like that to understand their purpose is a waste of human brain power.

Also, it's much less effort to debug when you're trying to work out why the condition is failing, as you can dump the output of $second_biz_is_bax without copy-pasting actual code. (It may be that people with fancy debuggers can get that anyway, but I don't have a fancy debugger...)

Profile

jducoeur: (Default)
jducoeur

July 2025

S M T W T F S
  12345
6789101112
13141516171819
20212223242526
27 28293031  

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags