The Explanation subject logo: UNIX
2005-09-09
Posted by: badanov

The short version of our five day outage was a bad router.

What happened was when we first acquired this DSL connection in August 2004, we had to buy a router according to SBC. When the day came to install this link, we tried to set the router up just like just like the few routers we had set up before. Setting up a router is supposed to be a straight forward task, nothing terribly complicated about it. It didn't work then, and in fact the only thing that did work was one IP, which we used and developed on from that time to last week.

The week before last week, we were working on a web-mail interface for our company website and the work was going poorly. After ironing out every possibility which could prevent the web-mail interface from working we came to the conclusion that it was essentially a DNS problem, and the only way to get mail past SBC's servers was to set up our own.

We had been reading and studying BIND for about a year, so we believed we were ready to bring our own DNS server on-line.

The first event was our trusty rusty Win98SE machine refused to log in to the router, even while it was on the same network segment. And in fact, for three solid days, the Win98SE machine was pretty much worthless. We did get the DNS machine running to the point where we could use the text-only browser, Lynx, to log in to the web interface on the router. All the usual procedures to set the router where it would act as a router failed. By this time we had spoken to SBC Tier 1 support about three times, the third time of which the support person refused to send us to Tier 2 support without a Windows machine running and logged in to the router. We explained to the fella that we could log in with a Unix text-based browser, but it was no use trying to convince the guy. No Windows, no Tier 2 support.

By this time, it was Monday. Three days without Internet in or out. Finally we got the Win98SE machine to log on to the router, but we just couldn't get the router to behave. We were finally sent to Tier 2. We talked to a very nice fella who walked us through all the procedures for getting Internet, which we finally did. Unfortunately the router refused to router anything to our servers, so even with Internet, we still had the same basic problem.

Tuesday night, after the days job, we were finally directed straight to Tier 2 and talked to a fella named Sparky, For real, the man's name was Sparky. After working with him for about 90 minutes over the phone he realized we had a really bad problem, and he offered to give us a new block of IPs, and if that didn't work, we would have to open our place to SBC technicians to work the problems here.

After getting the new IPs, nothing, so Sparky mentioned we may have a bad router. I mentioned to him that we had an old DSL modem from a previous DSL provider, and couldn't SBC just route all five IPs through there.

“Are you sure you can route five IPs on your own?”

“Thats the way I wanted to work DSL in the first place.”

Removing the router did the trick. By 0200 Wednesday morning we were essentially live on the Internet, all fat and happy. We did have an outage of about two hours Thursday night, but that was a combination of screwy server firewall rules and a bottle of nice white wine.

We sure hope we didn't put both our reader to too much inconvenience, but that's our story and we're sticking to it.

If you have something to add, Fire Away!

Number of Comments so far: 0

Click here for a list of stories in the Unix and Computer category