sysadmin tip: reboot more often
Wednesday, April 14th, 2010While we all strive for maximal Uptime, one thing that often goes by the wayside is the ability to reboot the machine without wondering whether it will come back up correctly.
You do not want to find out that the machine has the wrong boot settings when it gets powered off unexpectedly and then doesn’t come back up.
“do a reboot” is a good checklist item when adding a new service to a machine. You’ll want to ensure that the service comes up correctly without intervention after a reboot. Of course, you’re installing the service during a maintenance period, so rebooting is not an issue, right?
In my day job, we learned this lesson the “medium hard” way when we had to move datacenters. The specialized movers that were hired had a full power cycle on their checklist of things to do to the machines before unracking them. You really want to make sure the machine comes up correctly before moving it to a different datacenter/network where you may have different unrelated problems.
If you ask around, you’ll hear lots of horror stories about machines that had been up and running with great uptimes (1 year! 2 years!) that no one was willing to touch lest some undocumented thing breaks on reboot.
Today, with many services living in small virtual machines (or VPSs), a reboot only takes a few seconds, so it’s much easier to do.
Add this “best practice” to your sysadmin toolkit. For more details on related topics, see “The Practice of System and Network Administration”.