Server smackdown: A networking newbie steps up

analysis
May 23, 20127 mins

IT rookie performs disaster recovery with servers in the hosting facility, then with his career in the CEO's office after a tense all-nighter

This humbling tale comes from a balmy summer evening in 2007 during my first real corporate IT job.

At the time, getting hands-on corporate IT experience was very important to me, so I was willing to work for peanuts. I’d found a position that allowed me to learn on the job as an assistant under the IT manager. Gradually, I would be transitioned into the network admin role as my knowledge grew.

[ For stories about mistakes IT pros have made, read “True tech confessions II: Sinners and winners.” | Follow InfoWorld’s Off the Record on Twitter for tech’s war stories, career takes, and off-the-wall news. | Subscribe to the Off the Record newsletter for your weekly dose of workplace shenanigans. ]

I soon found out that the CEO was tough to work for. Turnover was high, and the IT manager quit about a month after I started. They asked me to assume networking duties while they searched for a replacement. The replacement never materialized, and they asked me to take on the job permanently.

About three months later, one of the production SQL servers at our hosting facility had a bad hard drive that needed to be swapped. The hard drive was one of the two drives that made up the RAID 1 array for the operating system. I phoned the manufacturer and had a technician dispatched to take care of it.

But after the technician left, the RAID 1 array rebuild process failed. I called the manufacturer back, and a low-level tech reading from a script recommended I attempt to remotely reinitialize the drive again. No dice.

The next troubleshooting phase involved this ugly process:

  • Walk up to the server and power it off.
  • Pull the drive.
  • Power the server back on and allow it to see that the drive is not there.
  • Power it back off.
  • Put the drive back in.
  • Power the server back on.
  • Go into the RAID setup menu and reinitialize the RAID 1 array using the two drives.

The technician explained that what should happen is the RAID controller would recognize the existing array on the drive and rebuild it accordingly.

The hosting facility where the server lived was an hour away. I had other projects I needed to do on servers there, so I scheduled a visit to the facility for the following evening. However, unexpected events came up at the office, and I got a later start than I’d planned.

10 p.m.: I arrived at the secured facility and started my other miscellaneous projects. I was able to get them done in a couple of hours. My eyes grew heavy, but none of that! There was still work to do.

1:27 a.m.: I proceeded with the fix to the RAID issue on the SQL server. I followed the process the technician had outlined — one slow, bleary-eyed step at a time.

Finally, I got to the final step, and within the RAID menu selected the option to reinitialize the RAID array. Then came the critical question that was absolutely necessary to answer correctly: “What sort of RAID array do you wish to create?” (Cue tense background music.)

In slow motion, my hand hit the down arrow key, highlighted “RAID 0,” and hit Enter.

“Are you sure? All data on these drives will be lost.” This message gave me pause for a moment, but the lady on the phone had said this process would be OK. So I pressed the Y key and hit Enter again. The deed was done. “Array initializing …”

This looked like it would be a long process, so I went to the break room for a much needed cup of coffee.

2:10 a.m.: I returned to check on the progress of the RAID array and arrived just in time to see the progress bar tick to 100 percent. Cool. I casually powered the server off and back on again. Then it happened.

My blood went cold as the words “Operating system not found” appeared at the top of the screen. I stared for a minute, unsure what to do.

Then it occurred to me: I should have selected RAID 1, Not RAID 0! Curse words could be heard over the whine of the air conditioners.

2:14 a.m.: Remembering that we contracted with a hosting company to do full data backups of our servers, I put in a desperate call for a rush restore.

“Yeah, you see, you have a contract for operating system backups, but you never contracted for ‘bare metal’ recoveries. Unless you have a functional server, there is nothing we can do for you.”

More cursing on my part.

2:21 a.m.: I dumped the contents of my backpack out on the table and found the Windows 2000 Server SP4 disk. Shoving it in the drive, I rebooted. After the obligatory setup questions, the operating system started loading.

Where was my SQL Server 2000 disk, though? More cursing as I realized it was in the other folder back at the office!

2:53 a.m.: I jumped in my car, hit the gas, and made the 1-hour trip in about 35 minutes. Within minutes I’d retrieved the entire CD folder and threw it in the car. Warp 9 back to the hosting facility.

4:07 a.m.: When I got back, the Windows 2000 install process had gone OK and the server had a basic unconfigured copy of the operating system on it. Next, all of the IIS components, updates, and so on had to be installed. Finally, I was ready to install SQL Server 2000.

Fortunately, the databases lived on the RAID 5 drive, which was not blown away. All I had to do was reattach them to the server. I did a test connect from another PC, and the server was OK.

6:01 a.m.: I exited the hosting facility building just in time to see the sun rising over the smoggy horizon. It had been a bad night, but the drama was not over yet. During my slow rush-hour drive back to the office, I had time to contemplate my coming professional demise.

7:27 a.m.: I moped into the office front lobby, a dead man walking. I had killed a production server, it was totally my fault, and I probably deserved to be fired for it.

We were yet again low on staff, so at the time I reported to the CEO. Walking into his office, I shut the door and explained everything that had happened. However, he was only half-listening and probably didn’t understand most of what I was telling him anyway. He cut me off midsentence and said, “But it’s working now though, right? That’s all I care about. Now go home and get some sleep.”

I will not lie — I walked out of the CEO’s office smiling, choking back tears at my good fortune.

The server still had a RAID 0 array with no redundancy, but at that point I was afraid to touch the server again to change it back. It wasn’t long afterward that the client whose data was stored on the server went to another company for its data hosting. Apparently, the dysfunction of the company was obvious enough in other ways that the client no longer wanted to do business with us. The aging server was backed up one last time and decommissioned.

Within a couple of months, I had found a desktop support position with another organization and moved on myself, a little wiser than when I’d arrived.

Do you have a tech story to share? Send it to offtherecord@infoworld.com. If we publish it, you’ll receive a $50 American Express gift cheque.

This story, “Server smackdown: A networking newbie steps up,” was originally published at InfoWorld.com. Read more crazy-but-true stories in the anonymous Off the Record blog at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.

infoworld_anonymous

Since 2005, IT pros have shared anonymous tech stories of blunders, blowhard bosses, users, tech challenges, and other memorable experiences. Send your story to offtherecord@infoworld.com, and if we publish it in the Off the Record blog we'll send you a $50 American Express gift card -- and, of course, keep you anonymous. (Note that by submitting a story to InfoWorld, you give InfoWorld Media Group, its affiliates, and licensees the right to republish this material in any medium in any language. You retain the copyright to your work and may also publish it without restriction.)

More from this author