basic sysadmin troubleshooting part 1

There are a bunch of things that I look at almost any time I log into a machine.

  • date
    Is the output of this command what you expect? Time synchronization issues are often the cause of odd problems. If the time is wrong, check your ntpd config and output of ‘ntpq -p’.
  • w
    This will show you the uptime, first. Is that value roughly what you expect? It will show you load and users. If the machine has been up for less time than you expect, figure out why it rebooted. (And consider upgrading your sysadmin philosophy towards change management). If there are users you don’t expect to be logged in…
  • vmstat 2
    Check the swap I/O, regular I/O, check if any processes are blocked, check if memory usage changes drastically over a short time, check the CPU usage. collectl can give you more detail, but it needs installation.
  • dmesg|tail
    Are there any unusual message here? What will you do about them?

Leave a Reply

You must be logged in to post a comment.