Archive for the ‘Uncategorized’ Category

useful uses of OpenSSH

Monday, August 2nd, 2010

Here are two tricks I’ve used recently. OpenSSH is indeed the swiss army knife of utilities.

Both of these tricks require a machine that has sshd running and that is on the network you want. Luckily, I have SSH access to many machines around the world.

Went outside of the country and wanted to listen to Pandora. But Pandora doesn’t allow streaming outside the US. Want to have all of your web traffic go through a machine in the US? Just use an ssh “dynamic proxy”. I use OpenSSH in conjunction with Chromium. So simply “ssh -D local_port user@host_in_the_US”. E.g. if you use 1080 for local_port, you’ll have a SOCKS5 proxy available on localhost:1080, and all traffic to that port will go to host_in_the_US and from there to the Internet. Then I also do ‘chromium-browser –proxy-server=”socks5://locahost:1080″‘ and bam, streaming Pandora (or access to any other webapp that whitelists IPs).

Second use is a reverse SSH tunnel that allows a connection to a machine that’s otherwise not accessible from the Internet. E.g. a machine on someone’s private network that can connect out, but is firewalled off from the outside. Again, you need a host on the Internet that has sshd and that the private machine can connect out to. So: hostA is on a private network. hostB is on the Internet. hostA can connect to hostB, but hostB can’t connect to hostB (because of NAT or firewall or whatever). From hostA, do “ssh -R 10022:localhost:22 user@hostB” Then port 10022 on hostB will forward traffic to port 22 on hostA (or another host on the private network if you use something other than “localhost”). Then, you can ssh to hostB, then “ssh user@localhost -p 10022″ which will actually connect you to hostA. These instructions were adapted from here.

solving real-world problems with OpenAMD

Thursday, July 8th, 2010

This is OpenAMD: http://amd.hope.net/frequently-asked-questions-faq/ It’s a cool “hacker” project, but I think it can actually be used to solve a number of real-world problems. Small, common problems that attendees of a conference have.

Problem #1: quickly and easily looking up information about the person next to you, in parallel with talking to them. Ideally silently and covertly, but the HUD is not yet available :)
Implementation #1: use the location API, figure out which is you, figure out which other has minimum distance, display all info about that one. Perhaps do this in real-time, so you should get an updated view when a new person walks up to you. And you can probably do this on your smartphone or on your laptop.

Problem #2: find a particular person
Implementation #2: this one should be pretty easy, you just need their uid, and you query their current coordinates. However, the grid coordinates might not be so human-readable, so a directional arrow would be nice. For even more icing on the cake, display their historical path, so you can guess which direction they’re moving.

Problem #3: finding a person by real name or other attribute (zip code?) This is an extension of #2. Use case: you know your friend Dave is there, but you’re not sure where he is. How do you find his tag’s uid? Hopefully you can query just by name or handle to get the uid.
Implementation #3: sounds like you’ll have to query a separate database of metadata to see if you can identify the uid of the person based on the information that you have

iperf heuristics

Thursday, July 1st, 2010

Iperf is an excellent tool to test your network performance. But it’s also a good troubleshooting tool: run iperf and see if the results “feel” right.

There are two problems I’ve been able to diagnose using iperf. One was a performance problem with a NIC driver. Take two new machines connected by a gigE switch, and run iperf between them. You should get very close to wire speed. If you don’t, it’s worth investigating. In my case, I was getting ~650 Mbits instead of the expected ~980 (among other problems). After fixing the problem (in this case, upgrading NIC driver), iperf showed close to wire speed as expected.

Another time, there was what turned out to be a problem with a switch. It was a gigE switch, and single iperf streams were fine, ~950Mbits between two machines. However, having one machine as an iperf server and starting up several iperf clients showed much lower aggregate throughput. Another thing I noticed was that the throughput was not evenly split between nodes. After a lot of other troubleshooting (kernel settings, NIC drivers, switch config) replacing the switch with a different one and re-running iperf showed even distribution of performance, e.g. ~100Mbits for each of 10 clients, with an aggregate of ~1Gb at the server.

I use something like ‘iperf -s -w 512K -l 56k’ and ‘iperf -c head -i 3 -t 120 -P 16′.

useful uses of ‘dig’

Sunday, June 13th, 2010

The ‘dig’ command is a tool that allows you to query the DNS system. Here are some ways that I use it that are not covered in the man page.

By default, ‘dig’ will use the DNS servers configured in your system resolver (/etc/resolv.conf on Linux) but you can specify any DNS server. Useful ones are some public ones: 8.8.4.4 and 8.8.8.8 are provided by Google. OpenDNS provides 208.67.222.222 and 208.67.220.220 (but beware they don’t return NXDOMAIN). There’s also 4.2.2.1 (not sure who provides it, but it’s easy to remember).

So if your home ISP DNS server does “DNS hijacking” and returns the IP of one of their web servers instead of NXDOMAIN, you can double-check the result with a quick dig command.

It’s also useful for checking how the propagation of a DNS entry is going. Ask the authoritative name server for the entry, then one of these public caching servers, then your ISP.

The two most common flags I use for dig are “+short” and “-x”, for terse output and a reverse lookup, respectively.

You can get the ‘dig’ command on Debian/Ubuntu by installing the ‘dnsutils’ package. On RH, it’s in ‘bind-utils’.

sysadmin tip: reboot more often

Wednesday, April 14th, 2010

While we all strive for maximal Uptime, one thing that often goes by the wayside is the ability to reboot the machine without wondering whether it will come back up correctly.

You do not want to find out that the machine has the wrong boot settings when it gets powered off unexpectedly and then doesn’t come back up.

“do a reboot” is a good checklist item when adding a new service to a machine. You’ll want to ensure that the service comes up correctly without intervention after a reboot. Of course, you’re installing the service during a maintenance period, so rebooting is not an issue, right?

In my day job, we learned this lesson the “medium hard” way when we had to move datacenters. The specialized movers that were hired had a full power cycle on their checklist of things to do to the machines before unracking them. You really want to make sure the machine comes up correctly before moving it to a different datacenter/network where you may have different unrelated problems.

If you ask around, you’ll hear lots of horror stories about machines that had been up and running with great uptimes (1 year! 2 years!) that no one was willing to touch lest some undocumented thing breaks on reboot.

Today, with many services living in small virtual machines (or VPSs), a reboot only takes a few seconds, so it’s much easier to do.

Add this “best practice” to your sysadmin toolkit. For more details on related topics, see “The Practice of System and Network Administration”.

Caitlyn Martin is a troll!

Monday, April 12th, 2010

There are a lot of writers who write about controversial topics not to add anything of value to the debate, but merely to stir up the flames. Today, that typically means attracting a lot of page views and lots of comments.

How Canonical Can Do Ubuntu Right: It Isn’t a Technical Problem by Caitlyn Martin is a perfect example.

We won’t focus on the fact that the sensationalist headline does not match what she says in the article, which is in fact mostly about technical problems she had with Ubuntu.

It’s the “trolling” sentences that are the signature of this type of article.

  1. “Other distributions which target the desktop and the wider consumer market do a much better job from a technical standpoint. They produce a better product.”
  2. Which “other distributions”? How are they better?

  3. “Even considering all of that I still feel that the downloaded Ubuntu offerings more often than not have been substandard when compared to other distributions.”
  4. Which “other distributions”? How is Ubuntu “substandard”?

And finally, after rambling about several unrelated topics, the conclusion: “At this point I recommend Mandriva 2010 for newcomers to Linux. No, it is not bug free. No distribution is. Mandriva’s developers are simply more responsive to bug reports and get issues fixed, usually in a timely manner. In addition, while Mandriva has had a few less than stellar releases they have, more often than not, done a pretty good job of getting things out that work. As always, your mileage may vary.”

My technical mind translates that to “Mandriva worked better for me than Ubuntu on the one box I tried it on.”

I’m not writing this to say that Ubuntu is awesome. I’ve had my share of problems with Ubuntu. I’m writing this to say that Caitlyn’s article is awful. She doesn’t say anything new and she makes vague complaints that only trick other people into trying to counter them. Well, I’m not feeding this troll.

Why run CentOS?

Thursday, April 1st, 2010

Karanbir Sangh (one of the CentOS maintainers) asks “Why do you run CentOS?”

I’d say our indirect philosophical reasons are:

  • We want to use Free Software
  • We want to use the best tool for the job
  • We want to hire smart people

For our application, we can use almost all Free Software, except this one commercial package (IBM’s GPFS), which is not Free, but is the best tool for the job. IBM only support GPFS on SuSE or Red Hat, so we choose Red Hat, mainly because it is more common. It is also much easier to find qualified people who are familiar with the RHisms of Linux. While I’m a Debian guy at heart, it’s easy to adjust between RH/Debian.

So that’s why we’re on CentOS.

digital archives and the “bit rot” problem

Tuesday, February 23rd, 2010

Today, I went to a presentation given by Vint Cerf and Robert Kahn. One of the problems they presented as still unsolved was the problem of retaining information in a readable format in the long term. Vint made a pretty funny joke about trying to open a PPT file from 1997 in the year 3000. Even using Windows 3000 with Office 2998, the file may not necessarily be readable.

This is a problem that many people have experienced first-hand. Have any old 5 1/4″ floppies laying around? Think you can still read them? And assuming you can, can you then read the file formats, which may be proprietary to software that no longer exists?

Lo and behold, a couple of hours after the talk, there’s a Slashdot story on the very same topic, pointing to an American Scientist article titled “Avoiding a Digital Dark Age”.

My thoughts on the matter. There are three separate layers here: the media longevity, the media format, the file format, and each needs to be designed with the same longevity goal in mind.

I will give an example here of how one software product handles this problem. That software product is Bacula. It uses an open, documented format for its file contents. So you can print out the specification on paper, if you like, and then sit down and re-implement the code and be able to read its files. It also uses the same format on different media, be it tape or disk. AFAIK, this design decision was made after seeing the evolution of ‘tar’ and GNU tar. Even with the same name, there are some versions of tar that produce incompatible files.

So the key is to use an open, documented format. Furthermore, it needs to be truly Free Software, not just open source but encumbered by a patent, for example.

basic sysadmin troubleshooting part 2

Friday, January 29th, 2010
  • top
    You’ll probably want to learn some basic top commands. E.g. hit “1″ to see the CPU break-out. Hit “z” to highlight processes that are in state “R”. Hit “O, n” to sort by memory usage. Hit “u” to type in a particular user name. “c” to see full command lines. 15 Practical Linux Top Command Examples
  • ps
    You’ll probably want to learn some basic ps switches. “ps -ef” for a listing that gives you users and commands. “ps auxf” that also adds some CPU and memory information for the processes and shows them as a “forest”. “ps -efL” also shows you threads for multi-threaded processes. “man ps” will tell you way more than you want to know. Another more useful example: ps -eo pmem,pcpu,rss,vsize,args|sort -k 1 -r -n|head
  • iostat
    Iostat will show you some information about the I/O subsystem. I like “iostat -k 5″; it’ll show you updates in kilobytes and 5s increments. The very first screen will show averages since boot, subsequent screens only information over the last 5 seconds. Add “-x” to see information about queue lengths and average request size as well as “service time”, i.e. latency of I/O processing.

basic sysadmin troubleshooting part 1

Monday, January 25th, 2010

There are a bunch of things that I look at almost any time I log into a machine.

  • date
    Is the output of this command what you expect? Time synchronization issues are often the cause of odd problems. If the time is wrong, check your ntpd config and output of ‘ntpq -p’.
  • w
    This will show you the uptime, first. Is that value roughly what you expect? It will show you load and users. If the machine has been up for less time than you expect, figure out why it rebooted. (And consider upgrading your sysadmin philosophy towards change management). If there are users you don’t expect to be logged in…
  • vmstat 2
    Check the swap I/O, regular I/O, check if any processes are blocked, check if memory usage changes drastically over a short time, check the CPU usage. collectl can give you more detail, but it needs installation.
  • dmesg|tail
    Are there any unusual message here? What will you do about them?