Last year I started reading Meditations by Marcus Aurelius1. While I was reading it, I was struck at how many of the entries were just simple reminders to himself. Don't get mad at people unnecessarily. Remember that you are just one of many. Don't get distracted.
He was making the same mistakes over and over, just like I was.
I thought that writing down what I learned would help it stick. Every time I wanted to add a new tidbit, I would review all of them. It worked much better than I thought it would: it brought issues to the forefront of my mind that I wasn't considering. This was especially helpful when I was in a rush or under stress. Staying calm during a production outage and thinking of the lesson I learned, "First do no harm" kept me from making dumb mistakes.
Now that I'm leaving my job for another opportunity, I decided to collect them and share them so that they might be useful to a wider audience. They follow verbatim. Quotes are from Meditations.
Failover that you haven’t tested isn’t a failover
Failover needs testing and monitoring. Backups need testing and monitoring. Include these in your work estimates.
Make it hard to use wrong
Libraries and APIs should be easy to use correctly and difficult to use incorrectly.
Write the documentation first. As you develop, change the documentation before changing the code. Focus on your user: they are the only reason you're doing this.
For code that’s going to fail, make it fail earlier rather than later. Failing at object construction time is better than while it’s being used. Failing at compile time better still.
Make it monitorable
If you can’t monitor it, how do you know if it even exists? Take the time to get monitoring right.
Monitor it from a customer’s perspective
Write a program which continually uses your APIs or user interfaces. Make it send emails to everyone when it doesn’t work as expected. This can catch subtle errors and monitoring gaps.
Test your hypotheses
If you think you improved performance, you need to have enough monitoring and diagnostics in place to show that you did. Otherwise you didn’t improve anything.
Make it easy to operate
Making products easy to operate and fix in production is a necessary and oft-unspoken feature of every product. There should be tools which can fix issues, and these tools should be tested like every other product.
Think about what to log
Logging exactly what you need to debug without logging too much is tough. Take some time to get it at least close to correct.
Make it easy to deploy
The goal should be continuous deployment. Everything that gets in the way should be automated.
Don’t hide logic
Keep blocks of code that are related near one another. Moving it to a different place makes it hard to follow.
Beware expiration times
Expiration times are the devil. If you see one, it must go on some calendar somewhere. Think before you set an expiration time to five years. What if you set it to a month from now? There are only two valid values for expiration times: finite and on a calendar, or eternity.
Negative test everything access-control related
If you have a signature authentication scheme, you must have tests which test both it accepting signatures and rejecting bad ones. You really should have both unit tests and integration tests for this. The same goes for any access controls.
First do no harm
When there is an outage or a problem in production, before you take any steps to correct it, remember not to make the problem worse. Often decisions are made in a panic without fully considering the implications.
Analyze theories before acting on them
It looks like machines in data center A are corrupting files and machines in data center B are working fine. Solution if you stop your analysis there: stop using data center A. Actual problem: network latency triggering a bug. Analyze your theory more and you’ll find the real issue.
Logs need to be searchable
Put all the logs in some centralized, searchable location. This is critical for debugging production issues fast.
Don't blame others
A good doctor isn’t surprised when his patients have fevers, or a helmsman when the wind blows against him.
Bugs happen. It's important not to focus on who made the mistake, but why our process failed. Resist people who wish to blame anyone.
No one reads long emails
So make ‘em concise.
Distance makes communication hard
Most problems we had stemmed from some lack of teamwork or communication. Phone calls and frequent messages or emails help. Regular meetings, not so much.
Questions not statements
If they’ve made a mistake, correct them gently and show them where they went wrong. If you can’t do that, then the blame lies with you.
When criticizing another’s work, ask questions rather than make statements. This helps people think through their own work and assess it objectively.
The Principle of Charity
Assume people’s work is well intentioned from the start. No one sets out to do poor work or make (what seems like in hindsight) bad decisions.
Turn criticism into better products
Beautiful things of any kind are beautiful in themselves and sufficient to themselves. Praise is extraneous.
When someone criticizes a product or feature you wrote, see if that criticism can be turned into a bug that can be fixed or an improvement to be added later. Challenge people who criticize often to always include a way to address their criticism. Don't be offended.
Keep overhead positive
For every hour of overhead (meetings, process, travelling), it should at have at least that many hours of saved time. Travelling for two days is worth it if it saves 40 hours worth of work from being needlessly done.
Convincing others is delicate work
Start with the most important point only. Once they begin to see that the current situation might not be perfect, introduce more ideas slowly. Act unemotionally. Allow them to think through the idea on their own. Watch 12 Angry Men again.
Vertical visibility is a double-edged sword
It’s great when higher-ups notice your great work. It’s bad when they’re informed of every little bug and freak out about it.
Take unsolicited advice from experience with a grain of salt
Some people will try and give you unsolicited advice. If their main argument is they’ve been doing this for N years which is X years more than you, you should be skeptical.
Have infodump sessions
Communicating is hard. If you want to learn about a system, put it in the infodump spreadsheet and pick a person who knows about it. If enough people vote for it (small, 2 or 3 required), it will be given. No slides. Hands-on.
Don’t feign surprise
Leave other people’s mistakes where they lie.
You know when you're surprised when someone did something stupid, or didn't know something? People learn things everyday. Acting surprised is rude. So don’t do it.
Watch where your company is going
Where are you headed? Where are they headed? If you're not aligned, it's time to look elsewhere. Ask yourself this every few months.
The Red Queen Hypothesis
It’s not enough to merely improve. Everyone is improving constantly, so linearly improving your own offerings merely keeps you in the same place. You must go above and beyond current technology to make any real progress.
Take time to think
If every moment of every day is spent adding features, removing bugs, or coding, it’s hard to innovate. Make sure there’s time in the development cycle for lower intensity work to allow reflection and creative thinking.
Solve problems, don’t just create features
The goal of software engineering is to solve people’s problems. Adding features for completeness wastes time. Beware product feature checkboxes and customers who don’t want to buy because you don’t have feature X even though they don’t need it.