Software Engineering Guiding Principles - Part 1

15 Jun 2016 - 4,757 words - Comments

This post previously appeared on my blog, 'Absolutely No Machete Juggling'.

Have Strong Opinions, Weakly Held

Don’t Be A Jerk

The Team Unqualified to Refactor is Unqualified to Rewrite

Exceptions

Choose Boring Technology

Inventing Languages

Will You Understand This at 3AM?
Deliver Working Software Early and Often

Plans are The Opposite of Working Software

Part 2…

I find that I repeat myself often at work. There are a handful of things I say so often when discussing decisions that I’ve been called out for it on occasion for acting like a broken record.

But the reason I keep repeating these phrases is that I think they inform a great deal of my decision-making. They are, in effect, my guiding principles when developing software professionally.

I thought it might be fun to write a few of these things down because I think that they’re worth sharing - I feel like these principles have steered me in the right direction time and time again. Obviously, there are exceptions to these and there are times when they should be ignored (after all, not being a zealot is one of the principles) but I think they will generally take an engineer down the right path.

Have Strong Opinions, Weakly Held

I think the phrase I’ve heard more than any other in my life is “tell us how you really feel!” which is I guess people’s way of telling me I’ve made them uncomfortable by expressing an opinion too aggressively. It’s true, I can be very strongly opinionated, and I’ve gotten into more than my fair share of, oh, let’s call them “passionate discussions” in the workplace. I’m never insulting or personal, but I have strong opinions on how to do things.

That said, I think it’s important to always be open to having my mind changed. If anything, I think I’m TOO easy to convince to change my mind on something, often it takes only one strong counterpoint to completely demolish an opinion I’ve held firm to for years. My opinions are informed by years and years of experience, but that experience doesn’t always apply in every situation, so it’s important to be willing to adjust in light of new information or facts.

Apparently this phrase “strong opinions, weakly held” comes from Stanford Professor Bob Sutton. I think it’s a good way to approach every opinion really. I’ve switched between polar opposite positions on a number of issues, including political and philosophical issues that I won’t get into on this blog, but I think I do a good job of allowing my convictions of experience to be suspended to make way for alternative arguments. I never assume I’m objectively right just because I care.

Unless you hunger for battle, don't be a zealot

It’s important that the thing that makes an opinion weakly held is a strong, rational, logical argument for the alternative position. I won’t back down on something I think is important because of how passionately another person disagrees, or how upset it makes them that they’ve met opposition. This is what makes the opinion strong: I genuinely care about believing the largest possible number of true and correct things, so the only way to dislodge a strong opinion is with true and correct things that work to counter it.

Don’t Be A Jerk

I cringed when I watched Season 3, Episode 6 of my favorite show Silicon Valley, as the main character Richard felt so strongly about Tabs over Spaces that he alienated everyone in his life over it. These debates are so incredibly pointless to me, I do not understand how people waste so much time caring about them. Strong opinions are not the same as zealotry, zealotry is company and team poison. Strong opinions only matter if the things they’re about matter. Having extremely strong opinions about tabs vs spaces, or emacs vs vim makes you borderline un-hireable to me, bringing zealots onto your team violates the No Asshole Rule (though for the record, spaces and vim, #sorrynotsorry).

Additionally, it’s fine to have strong opinions but if you find yourself belittling or mocking other people in order to stand by them, they probably aren’t that strong. Your positions on technical matters should stand on their own weight, without needing to knock people down. Don’t be one of those people that walks around acting like a jerk and then justifying it by saying you have strong opinions. The best engineers I’ve worked with have consistently been skilled at not only having well-reasoned strong opinions, but communicating those opinions respectfully to others.

Being a technical wizard doesn’t give someone the right to be a pompous ass to everyone else. I’m a strong advocate of taking people who are, at a personal level, insufferable, and firing them for being a poor cultural fit, regardless of how much they know about this or that technology. It’s better to have a hole in your team than an asshole.

I started this list with this one in particular because it’s important. The rest of this list is, essentially, a list of strongly held opinions I maintain. But it’s important that even these opinions, having reached Guiding Principle level, are subject to change in the light of strong counterarguments, or subject to suspension in light of unique circumstances.

The Team Unqualified to Refactor is Unqualified to Rewrite

I strongly, strongly believe that a full-on code rewrite is nearly always the wrong thing to do. Either you pull everyone off the current iteration of the product to do the rewrite, which means your main product languishes, or you pull some people off to do the rewrite, meaning the rewrite team has to always be catching up with the ever-growing main product.

From a simple project management standpoint, this is a disaster. Want to know how long the rewrite will take? Well, in the former case, you’re working with a team that’s dealing with new technology and new development, so there’s no way to apply any previously recorded team velocity as a prediction of future performance. Moreover, you don’t actually have any sense of the scope of the project, because the requirements are basically “everything the app does now”, which will include weird corner cases that have long since been forgotten. So you have an unknown scope and an unknown team velocity, and you’re trying to make a prediction of when this work will be completed? So development is going to stop on the main product line for an indeterminate amount of time. And this is the BEST case scenario, the one where everyone can focus on doing the rewrite.

In the latter case, it’s even more unpredictable - you still have the unknown scope issue, but it’s worse because you also have to include, in the scope, getting to parity with whatever else is built while the rewrite is being worked on. If the rewrite would take 3 months, you have 3 months worth of new features on the main product to catch up to. If it would take 6 months, you have 6 months of features to catch up on. And since you don’t know how long it will take just to reach current parity, you can’t predict how far in the hole you’re going to be when it’s “done”, which means it adds ANOTHER layer of unknown time into the mix. Maybe adding those 6 months of features takes you 5 months, so when you’re done you’ve got another 5 months to catch up on. That 5 months of work takes you 3 months to complete, so you have another 3. You’re basically asymptotically approaching done. And remember, the velocity of the “main product” team will be affected by the loss of resources who peel off to do the rewrite, so you have little sense of the velocity of not one, but both teams. If you know your car’s speed, you can predict when it will pass a landmark - but you can’t possibly know when it will pass another moving car if you don’t also know that car’s speed perfectly. If you know neither car’s speed, you’re utterly done for.

Moreover, from an engineering standpoint this is a terrible idea. Everyone likes doing greenfield work because it’s new and exciting, but you have to ask, why do the engineers want to avoid maintaining and refactoring the existing product? Is the codebase such a spaghetti mess that it’s too difficult to add anything, so the team wants to try again from scratch? Who the hell do you think made that dumpster fire in the first place? Why on earth would that same team suddenly do it right the second time around? Especially when under the pressure of “we have to get caught up” and the time-pressure of the company’s primary software products being frozen or at least slowed while the team develops it? It’s even MORE likely that corners will be cut and quality will suffer, not less likely.

Refactoring the codebase is almost always the right way to go. Take the awful parts that you want to rewrite and slowly but surely refactor them into the clean codebase you want. It might take overall longer to be “done” with the effort, but the entire time it’s happening the main product is still in active development without the “two cars racing” situation. Refactoring code is, though slower, also easier to do than rewriting it from scratch, because you’re able to do it in small steps with (hopefully) the support of a huge test suite to ensure you don’t break anything. Since refactoring is easier than rewriting, any team that says “it’s too hard” to the idea of refactoring the existing codebase instead of rewriting it is inherently not good enough to do the rewrite. The end result will actually be worse.

Exceptions

There are a couple noteworthy exceptions to this. One, when the reason for the rewrite is a complete change in technology, specifically the language of implementation. If you’re working with Java and want to rewrite in Scala or Clojure, the team should be able to refactor piece by piece since it all compiles to the same bytecode. However, if the team needs to move from a dead technology such as ColdFusion to something else like .NET, a full rewrite is the only way to go. This may also apply in the case of using a prototyping technology to develop the first iteration of a product, only to discover that there’s no way to make the system scale, such as in the case of Twitter’s abandonment of Rails in favor of Scala. Not every company has the resources to develop a new PHP runtime just to avoid rewriting their codebase in something other than PHP, sometimes you have to bite the bullet and pick different technology.

Another exception is when you find yourself in an “over the wall” situation. Perhaps a team of contractors or consultants or offshore engineers were hired to develop the first iteration of a project, and then the codebase was tossed over the wall to another team to maintain. In this case, the new team may in fact be qualified to both refactor OR rewrite the codebase, and may simply decide the codebase as-is is too much of a mess to bother with and do a rewrite. In this instance, I still would encourage exploring every possibly opportunity to refactor first, but believe me when I say I’ve been on the recieving end of these codebase bombs enough to fully appreciate that sometimes you just need to rewrite the whole thing.

One more exception, if your “product” is mostly just a collection of microservices and you’re talking about rewriting some of them, that’s another story. In the land of microservices, rewriting a service essentially is refactoring, and presumably you have a collection of integration-style tests against each microservice, so a rewrite can be done relatively quickly and relatively safely. Even if you want to rewrite all of the services, you’re able to do it one at a time - this is one of the big advantages of microservice architectures.

Choose Boring Technology

I really can’t say this any better than Dan McKinley’s original post Choose Boring Technology. In it, McKinley argues that every team or company should start out with three innovation tokens. You can spend these tokens whenever and however you please, but they don’t replenish quickly. Every time you pick an exciting or buzzwordy or cutting edge technology instead of an old standard, you spend a token.

Relational Databases are boring. Java is boring. JQuery is boring. Apache is boring. Linux is boring. Tomcat is boring. Choose something “cool” instead of something boring, and you’ve spent an innovation token. Boring technology is boring because it’s known, not because it’s bad. Its failure modes are understood, and it probably has a host of libraries and support tools make it easier to live with in the long term.

There’s nothing wrong with Java, tons of scalable applications have been built on Java, and “it’s boring” isn’t a good enough reason to choose something else. If your team truly feels like Scala or Clojure or Erlang or whatever is the right tool for the job, by all means use it, but that’s one innovation token spent. Pick MongoDB over MySQL or Oracle and you’ve got one left. Any time you COULD use technology you’re already using (“our other codebase is .NET”) but decide to pick something new instead, you spend a token.

Boring Technology is easy to pick up, easy to research, easy to debug, and frankly easy to staff for. I’m sure the engineering team is happy to pad their resumes with cool buzzwords while simultaneously making themselves irreplaceable, but is that really the best thing for the product and the company? When boring technology fails you, there are stacks of books and internet forums available to assist you - there’s nothing worse than the feeling of excitement you get when you search for your error message and find that someone else has had the EXACT same problem as you before, only to be followed by the crushing blow of zero replies.

I’ve worked plenty of jobs where the team was building plain old Java Web Applications using Spring, backed by MySQL or Oracle databases. You know what? Those products worked just fine. Did the teams have the most fun in the world writing that code? No, probably not, but we got the job done and the products performed quite well (and were easy to fix when they didn’t). A buddy of mine is fond of watching engineers pick and choose cool technologies out of the pool of the latest-and-greatest, only to remind us that he worked on a 911 call routing application written in Java with a MySQL database, and it ran just fine saving tons of lives.

At my current gig, we decided to build a 150,000-line codebase using Scala. Scala seemed like the right tool for the job, given the particular constraints we had about scalability and throughput in the system. I like Scala a lot, and there’s no doubt that we’ve made tremendous productivity gains by utilizing features exclusive to Scala, but if I’m truly honest with myself did we actually make an overall net productivity gain? When you factor in time lost trying to understand confusing code, time lost by the compiler doing a twenty-pass compilation (holy shit), and time lost by having to manually perform refactorings that our IDEs couldn’t automate due to weak tooling support, I’m not actually sure we came out ahead. Especially given Java 8’s functional programming features, I’m not sure I’d bother picking Scala over Java 8 today, as much fun as I have working with it. It’s not about how much fun I have.

Ultimately, it’s really not about me or how much I enjoy working with particular tools and technologies. My job isn’t to have a blast, hell it’s not even really to “write code” - my job is to solve business problems, and it so happens the best tool I’m most competent using for that is code. It’s important to stay up to speed on the latest and greatest technologies so that you as an engineer have the knowledge to know when it’s time to spend an innovation token, but honestly I think most of that effort should be relegated to conference attendance, reading, and personal github accounts. Don’t make company decisions based on how many buzzwords you can add to your resume.

Inventing Languages

I’d like to add that “writing your own programming language” should be worth four innovation tokens all on its own. If you develop an in-house programming language, you’d better have a staggeringly good reason. Good programming languages are hard to write, and unless you have a number of Computer Science PhDs with specializations in Programming Language Design and Implementation on the team, chances are all you’re actually doing is writing an overly complex DSL. The kind of thing whose compiler/transpiler/transliterator fails with “syntax error somewhere” in the event of a mistyped character, rather than a helpful diagnostic and a line number.

Don’t create your own programming language. Your language will be weak, your tools will be poor, and language support within other tools will be nonexistent. You probably aren’t going to properly staff the design and support of the language you’ve created. Unless you have an entire team of people devoted exclusively to maintaining that language and writing Eclipse plugins for it or whatnot, your technical debt is so crater-like that you can’t even tell you’re standing in a hole because it extends past the horizon. Whatever huge productivity gains you think your new language is offering your team, they’ll be canceled out and then some.

99 times out of 100, a new language isn’t what you want to build, but a library or a framework is. By all means, develop those in house if need be (but staff their development). Unless you’re developing a language as part of your core business, like Apple developing Swift, don’t do it.

Will You Understand This at 3AM?

Frequently John Carmack is cited as an example of an eccentric genius, the kind of guy who is way ahead of his time. I have to admit, I’m also in awe of a great deal of what he’s done with code. Take this square root function he wrote for Quake III arena:

float Q_rsqrt( float number )
{
	long i;
	float x2, y;
	const float threehalfs = 1.5F;

	x2 = number * 0.5F;
	y  = number;
	i  = * ( long * ) &y;                       // evil floating point bit level hacking
	i  = 0x5f3759df - ( i >> 1 );               // what the fuck? 
	y  = * ( float * ) &i;
	y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
//	y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed

	return y;
}

But notice line 10, i = 0x5f3759df - ( i >> 1 );? It’s easy to find, because it’s elucidated with the helpful what the fuck? comment. There’s no doubt that this code is extremely clever, and it’s beyond question that it’s extremely fast. It also requires an entire 2000-word Wikipedia article to understand.

In fact, Carmack himself wasn’t even the creator of this bit of wizardry, it came from Terje Mathisen, an assembly programmer who had contributed it to id Software previously. And in fact, he likely got it from another developer, who had gotten it from someone else. This is why the comment what the fuck? is right there - nobody understood it. And yet there it was, pasted into the Quake III engine code because it seemed to work and it was fast. Obviously this worked out for id, and Quake III is awesome, but it probably wasn’t the wisest idea to stake their company’s product on code that nobody understood.

Was it clever? Absolutely. But clever is the enemy of clear.

I try not to ever write comments in my code. Comments should not be used to explain how something works, that should be apparent from the code itself. And if that means adding a few temporary variables so that their names can be helpful (or inspected while debugging), or having some comically long method names, so be it. Often people say that comments can be used to explain “why” something works instead, but frankly I find that a few unit tests for the code in question will do a better job of explaining the why than a comment ever could - at the very least, take the comment you’d write explaining why and make it the name of the test. Code is for what, tests are for why. Comments are for jokes.

Obviously it’s difficult not to be proud of yourself when you’ve gotten some long method down to a one-liner (even if it is one incredibly long line) or invented some massively clever solution to a problem. And indeed, sometimes these clever tricks really are necessary to get the required performance out of a system (as in the Quake III square root example). That’s why I’ve found this heuristic so handy (hattip to Demian Neidetcher):

If your cell phone rings at 3AM because this code causes a production outage a year from now, will you be able to understand and reason about the code enough well enough to fix it?

Imagine that your job is basically on the line here, you’re now in a conference call with your boss, your boss’s boss, your boss’s boss’s boss, and the CTO. Hell, maybe the CEO is on talking about the millions of dollars in lost revenue every minute the product is offline. Your heart is racing from being startled awake, and your eyes are barely able to focus enough to read your laptop screen. Do you really want this to be what comes into focus in the middle of the night?

(n: Int) => (2 to n) |> (
	r => r.foldLeft(r.toSet)((ps, x) => 
		if (ps(x)) ps -- (x * x to n by x) else ps)
)

Yes it’s clever, yes it’s fast, congratulations on how smart you are. But your company code repository isn’t the place to show off your l33t coding ski11z, do that shit in your personal github account. You’re not being paid to fluff your e-peen, you’re being paid to solve the company’s business problems, and that means writing something that can be understood by the other people they hired. Code’s primary purpose is to be read by other human beings (and only incidentally for machines to execute), otherwise we’d all be writing directly in machine language. So if this future version of yourself won’t understand the code just from being tired, what chance does the dumbest person on your team have of understanding it? Stop showing off, your job (and maybe even your employer’s future) may someday depend on it.

Deliver Working Software Early and Often

I realize this is just a rewording of a standard part of the Agile Manifesto, and I could just as easily say “Be Agile!” here. But I think the truth is Agile has come to mean a lot of different things to a lot of different people, and has become a term so overloaded and hijacked that it’s effectively become useless as a phrase.

I like most of the ideas of the Agile Manifesto, but I think the most important thing to take away from it is the unparalleled value of getting working software into the hands of users as quickly and frequently as possible. I absolutely detest when features are held back so that they can be released in a “big bang” to really wow and excite users (hey Product Owners, your users really don’t care as much as you think, you’re just building a thing they’re forced to use to accomplish something). As long as a feature actually works end to end, get it into the hands of users and solicit feedback right away; every day you keep working code behind a gate is a day you give your competitors to steal users away from you. It’s also a day that you are effectively lying to your users - the most important people to your software - about what your product is capable of doing.

I despise long-running feature branches in version control as well, almost any time you want to make a branch I think it’s better to make a feature flag that people (specifically, product owners) can turn on and off at will. Long-running branches are incredibly susceptible to the 90/90 rule. And if two subteams wind up creating simultaneous long-running branches off the same mainline trunk, pack it in, you’re done for.

Every “big bang” release I’ve been a (reluctant) part of has ended in some form of failure. People think that the software is mostly done and then the effort spins its wheels at the end, trying to “harden” the release and remove bugs. Or the software is finally delivered only to discover that 80% of the users are only using 20% of the features, meaning that a more targeted, earlier release of those top 20% features would have been a far better use of engineering time and resources. The other 80% is now just cruft in the codebase, making it more difficult to add features later on, and nobody is using it.

Plans are The Opposite of Working Software

I think a corollary to this rule is, don’t sell your users on non-working software. I really hate the tendency for “marketing” to need delivery dates on software features so that they can start selling the features now, a situation I’ve seen at company after company. Don’t try to sell users on features you plan on delivering, even if you’re nearly certain about when those features will be done (but, hint, you’re probably less certain than you think). That’s selling vaporware, anything can change between now and then causing those features to be shelved or to not work properly. Instead, deliver working software early and often, and let the marketing folks sell users on what features are actually done, because more stuff will actually be done due to the team not wasting tons of time coming up with estimates (read: lies).

Just start referring to “estimates” as lies.

“how long will that take?”
“well, if I had to lie, a week?”
— Trek Glowacki (@trek) August 25, 2015

Obviously sometimes there are occasions where people need some sense of how long something will take, most notably when the company is deciding between two different features to implement and they’re performing an analysis based on their cost (though in my experience, rarely does this happen and usually both features are requested anyway). But for the most part, using some roadmap or a plan to inform the company on how to sell their products is a mistake - give engineers the time to properly implement features well, and then when the features are done sell people on them. And remember, good software sells itself.

Part 2…

I split this list into two posts for really no good reason aside from length. If you want more, check out Part 2.