Minor details and misunderstandings expose the cracks in a techie's carefully calibrated backup plan Murphy’s Law can hit even the best IT shops, despite all your efforts to avoid it. You may have carefully instructed your crew on how to properly execute a task and gone back to verify your directions were being followed, but it pays to revisit the process on occasion and confirm that the system is still working. One experience early in my career hammered home this lesson and made me grateful for today’s much more streamlined backup process.The company I worked for almost 20 years ago manufactured and distributed consumer goods, and it relied on a tape backup to protect against catastrophic server failures for its 15 remote locations. If a failure happened, we’d be able to restore to the previous night’s image and only need to replicate the work done so far that day.[ Get a $50 American Express gift cheque if we publish your story: Send it to offtherecord@infoworld.com. | For a dose of workplace shenanigans, follow Off the Record on Twitter. | For a quick, smart take on the news you’ll be talking about, subscribe to the InfoWorld TechBrief newsletter. ] This was not an ideal setup. But it was the best we could offer due to the slowness of the tape process and the limited staff we had. Requests to management to hire additional IT employees or improve the backup process had fallen on deaf ears. It takes a lot of stepsThe process worked well overall. Our backups were done at night when they wouldn’t interfere with other office duties. Each location had a two-week set of 10 tapes, labeled for each workday (such as Week 1: Monday). Each location also had an employee assigned to take out the previous day’s ejected tape, place it in its rotation slot, and insert the tape for that day’s backup. That person was also responsible for sending the backup tape from the last working day of the month to us in HQ. They would then add a blank tape in the rotation to replace the sent tape. We received tapes from all locations every month, so we’d have offsite storage of data for 7 years in case of audits and could reproduce data on our spare servers more rapidly than on the servers at a remote location if the need arose.For the daily backups, we set it up so that at 4 p.m. our remote servers would run a tape inventory and notify us if the drive was empty. This would give us time to contact someone at the location and get the correct tape inserted into the drive. (Half days before holidays were a real pain, but that’s another story.)The next step occurred at 10 p.m., when the tape drive would erase the tape in the drive, reformat it, and report this step’s success or failure by email. At 11 p.m. the backup would start and take 2 to 4 hours to complete, depending on the location’s database size. Once this was done, the software would eject the tape and report to us by email either its success or failure to complete the backup. I would get up early and check the reports from home (no smartphones then). If there was a failure at a location, I would contact a yard foreman and walk them through either replacing the tape or reinserting it so that I could clear the error and still have most backups completed before office staff would arrive.It was a decent plan given our circumstances, and years went by with few problems. Then I uncovered a major issue. How long has this glitch been going on? The end of the month was a busy time not only for our accounting office, but for me as well. Our AR/AP/GL/INV software required a lengthy process of compacting data and reallocating space for the month’s closing. This whole process was administered and run from HQ after hours, so I would put the daily backups on hold until the closing was completed.The last working day of the month would find me alone in the office burning the midnight oil. I had to slog through each location’s information separately and would be logged into six PCs connected to six different locations at a time. These connections were made by dial-up phone lines with 56k modems.Always expecting the worst, my first action would be to make a backup copy of each location’s end-of-month data in a secure, allocated space on that site’s server hard drive. I had started doing this because on rare occasions the closing process would lock up and require a complete restore. We were in the midst of an internal audit in September and needed to restore the database from a certain remote location from the prior April in order to put together a report. I maintained an old server, not on the network, expressly for such times and thought I should be fine. I brought the server up and mounted the tape.The tape software required me to first inventory the tape to get its ID, then run a catalog operation on it to list its backups and dates. Looking through the tape catalog I had expected to see a backup from a date for the last working day of April. Imagine my surprise to only see backups from the middle of the month of April. I grabbed the tape for March, and it too had a backup from the middle of that month.I was stumped. I’d never gotten an error message and was perplexed as to what could have happened. I thought through the process again. Slowly, it dawned on me what had probably been happening. I surmised that the employee at that location would remove the end-of-month tape and, rather than mailing it, would put it back in rotation before putting that day’s tape into the drive. Then they’d grab another tape out of the rotation to send to us, replacing that one with a blank tape. Their system ensured we received a backup, but it was for the wrong part of the month’s work. I had no end-of-the month backups from this location.I sat there for a moment, in shock and panic. I wondered how long this had gone on and if it existed at any other locations. With an IT staff that was already shorthanded and working 50 to 60 hours each week, tape verification was done by reading the log from the tape software, not waiting until the tape arrived at HQ and mounted to check if it was the correct tape. I wondered how I could recover from this one. Time for Plan B I switched back to the problem at hand, relieved to remember my end-of-month “safety” procedure to always make a hard drive backup of each location prior to commencing the database compacting and repair. Each location’s last-day-of-month tape backups were of the complete hard drive, including my extra backup. I grabbed the May tape, which was also from the middle of the month, and burrowed to the secure area. Sure enough, there was the extra end-of-month April backup. I was saved!I talked to that employee the next day, and my conjectures were right — the person said they thought it didn’t matter which tape from the rotation they sent to us. You can be sure that we went through the steps once more and that I double-checked the location’s tapes for some time afterward.There was a nice surprise from the situation. This event gave me more ammo to hire another employee and establish additional backup verifications — finally. I guess sometimes you need a close call of a worst-case scenario to get what you need. Send your own IT tale of managing IT, personal bloopers, supporting users, or dealing with bureaucratic nonsense to offtherecord@infoworld.com. If we publish it, we’ll send you a $50 American Express gift cheque.This story, “How many backups are enough? One more than you planned for,” was originally published at InfoWorld.com. Read more crazy-but-true stories in the anonymous Off the Record blog at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter. IT JobsIT Skills and TrainingCareersTechnology IndustryData Management