2015-11-25

An Operator's Decalogue



System administration is not an exact science. In fact it is no science at all. It is an art, and a dark one at that. The result is that philosophy affects the approach to running a well lubricated computer environment. There are absolutely points upon which can be agreed. I have attempted to compile a list of philosophical points that I hold.

Here are the ten commandments commands:


I. Thou shalt back up


This is the first point for a reason. The underlying principle here is that the unforeseen is by definition unknown. You don't know when a disk will die, only that it eventually will. 
Data you don't back up is data you don't mind losing. 
If the above sentence is not true, make it true. Remember to move the data off-site, else it's not back up.


II. Thou shalt maintain redundancy


If this seems like a reiteration of the first point, good. There is an underlying similarity, but don't confuse the two. 
Data you care about should be more than one place. 
You can achieve redundancy with replication, hardware raid, software raid (btrfs, zfs) or object storage solutions such as ceph. They do not replace backup, but they make sure that you can get from a broken state to a normal state easier.


III. Thou shalt consult thine neighbour


Making a change to the state of things requires an assessment of the consequences. Doing a quick web search, looking over the documentation and consulting another operator can save hours in a minute or two. 
Don't assume you know everything. You don't.


IV. Thou shalt share thy knowledge


Writing documentation can be a pain, but you should see it as your duty to spread your knowledge. Whether it involves writing documentation, writing tickets, talking to your colleagues, blogging, filing bug reports, sending e-mails or writing forum posts, make it known when you find idiosyncrasies, trivia and errors. 
Make a note every time you are surprised. 
Surprise is a discrepancy between the expected and the actual. Others may share your expectation.


V. Thou shalt not do a computers work


Keeping it simple and stupid involves making one solution for many problems. 
Automation can weed out biases and save labour. 
Human error can't be eliminated completely, but needless human error can. 
Learning a little scripting or even programming can help a long way. 
Processor time is cheaper than human time.


VI. Thou shalt respect your user


Pebkac occurs more often in the operator's chair than the user's. 
Understand your users use of computers and you can more accurately make the normal state a state which suits your users needs.
Happy users make happy administrators.


VII. Thou shalt fix that which is not broken


Stagnant computer environments are dying computer environments. Embrace change, but beware the consequences. Gather knowledge, identify areas of improvement and get to work. Understand the tolerance for fault, and stay a little under it, but don't aim for zero. 
Stability is important, but improvement is unstable.


VIII. Thou shalt fix that which you have broken


Change is fun, but you're not finished until you have done the cleaning up. 
Have pride in what you present to your users. 


IX. Thou shalt not make changes before the Sabbath


The downtime will be the expected time times phi [e × ϕ] if you are an optimist, 
times pi [e × π] if you are a pessimist.
Don't make a change right before going home or on holiday.



X. Insert thine own commandment here


Think for yourself.
Don't accept received knowledge without having understood it. You are the key to making the normal state the best it can be.


No comments:

Post a Comment

Please pay attention to spelling and grammar. The comment field is not moderated.