Go to Advanced Search


Computerworld Home










 



Security
Knowledge Center

Security News
Discussions
Events
Glossary
Vendor Listing
Resource Links
White Papers
Security XML Feed
Mobile Channel
E-mail newsletters

Knowledge Centers
Careers
CRM
Data Management
Development
E-business
ERP/Supply Chain
Hardware
IT Management
Mobile & Wireless
Networking
Operating Systems
ROI
Security
Storage
Web Site Mgmt
xSP
More topics...

Departments
QuickStudies
SharkTank
FutureWatch
Opinions/Letters
More departments...

Services
Forums
Research
QuickPolls
WhitePapers
Buyers' Guide
More services...



Home > Browse Topics > Security > Disaster Recovery > Story

Twisters, hurricanes, floods (oh my)


LATEST HEADLINES


  Tape Technology Stretches Out
  Oracle chats up PeopleSoft customers
  Romanian nabbed for launching Blaster-F

DISASTER RECOVERY NEWS

  Twisters, hurricanes, floods (oh my)
  Ridge sees technology, agency restructuring bolstering homeland security
  Moving networked storage farther and faster

 
 


Story by Matt Villano

SEPTEMBER 03, 2003 ( CIO ) - The evening of Sunday, May 4, 2003, at Aeneas Internet and Telephone began as any previous Sunday evening had. The Jackson, Tenn.-based company that serves about 10,000 Internet and 2,500 telephone customers was closed for the weekend, awaiting the return of its 17 employees the next morning. Just before midnight, however, all hell broke loose. An F-4 category twister touched down just outside of town, then tore through Jackson's downtown area, leveling houses, historical sites and municipal buildings alike. The tornado ripped straight through Aeneas's one-story building, leaving only a pile of rubble.

Meanwhile, Aeneas CIO and Operations Manager Josh Hart, who'd heard about multiple tornadoes in the area that day, was home, 52 miles away in Martin, Tenn., huddling in his bathroom with his family. As soon as he was able, he flipped on the TV for news footage of the devastation. What he saw looked like "a war zone," bricks and concrete everywhere and piles upon piles of rubble.

At 2 a.m., with those images in the background, Hart's cell phone rang--it was Aeneas Network Administrator Jason Warren calling from what he likened to Ground Zero to report that everything in Jackson was lost. Another call came in from CEO Jonathan Harlan.

"I'm listening to [Warren] tell me what it's like, and he says, 'It doesn't even look like there was an office here,'" remembers Hart, 25. "The tornado destroyed our computers, our desks, everything. I couldn't believe what he was telling me."

Aeneas lost nearly $1 million in hardware and software that night, and an estimated 72 hours of downtime. But just as Aeneas in Virgil's Aeneid endured the worst the gods had to offer, so too did this Aeneas. This one, however, was wise enough to have created a contingency plan--one that minimized the damage and kept the company afloat during its darkest hour.

The company is not alone. After a nationwide scramble to prepare for high-impact, low-probability events similar to the attacks of Sept. 11, CIOs have since realized that their organizations are far more likely to succumb to another type of event--one that has a high probability of occurring and, curiously enough, is probably simpler to predict: the weather. For example, in June, while the Atlantic seaboard was bracing for the start of hurricane season, Arizona was busy battling forest fires. And in Harris County, Texas, in 2001, a tropical storm and resulting flood taught one IT executive the importance of flexibility.

Both Aeneas's Hart and Steven W. Jennings, Harris County's executive director of central technology, share their experiences here in an effort to provide best practices and battle-tested secrets about which preparations work best. According to Carol Kelly, vice president of government strategies for Meta Group, these are lessons from which everyone can learn. "When disaster strikes, you want to be ready with a plan of action and an approach of how to deal," she says. "You might be ready for the next terrorist attack, but if you're not ready for the next nor'easter, your plans won't amount to much."

Big plans for a small company

Aeneas launched its contingency plan when it was founded in 1996; since then, CIO Hart has enhanced the strategy gradually almost every year. In early 2002, as the ISP neared 10,000 Internet customers, he and his network administrator, Warren, thought up the company's most comprehensive approach yet. While they determined that the likelihood of a terrorist attack on the western Tennessee town of Jackson, population 59,600, was slim to none, they concluded that because of the municipality's location in the central U.S.'s infamous Tornado Alley, the plan should respond to the next most likely cause of disaster--twisters. What ensued was a three-pronged plan that hinged upon colocation, distribution and backups.

  • First, by employing Border Gateway Protocol (BGP) programming on a high-class circuit shared with an ISP 90 miles down Interstate 40 in Memphis, Aeneas would colocate in real-time its IP addresses and reroute data traffic offsite during any local disruption. With this system, servers would automatically reroute Internet service operations the moment a disruption occurred. In theory, at least, that would guarantee continuity of operations across the board.

  • Next, the company distributed its voice traffic dynamically, paving the way to switch its T1 connections from one fiber node in the Bell South network to another, in the event of a sudden telecommunications infrastructure failure. This system was designed to preserve continuity much like the BGP system.

  • Finally, the company's network administration team engineered applications that stored customer records and other data on tape as well as on backup hard drives. Though the tape and hard drives were stored onsite at the Jackson location, Hart and Warren figured onsite backup was better than none.

This strategy wasn't put to the test until tornado season this year, when hardware, software and pieces of the local infrastructure were destroyed May 4. Business customers on T1 lines lost their connections as soon as the tornado struck. ISP traffic also went down immediately and took 36 hours to restore. The fiber node switch to recover voice traffic took a bit more time, as Aeneas programmers worked around the clock with technicians from Bell South to migrate the T1 connections from the old node to the new, finalizing the switch nearly three days after the twister hit.

"When you have hundreds of T1 lines that need to be moved from one node to the next, there's a lot of reengineering that needs to take place," says Hart. "We thought we were prepared, but I'm not sure we ever considered just how difficult this would be."

Bumps in the disaster recovery road

Beyond the challenges inherent in rerouting traffic, the remediation effort hit two other snags. The first revolved around colocation; because the colocation arrangement with the Memphis ISP was still being set up at the time of the tornado, the Memphis site didn't yet have sufficient servers. To remedy the situation, Aeneas staff members--and family and friends--drove to Memphis with additional equipment to handle the load. The company had some of this equipment on hand--what it didn't have, Hart and Warren purchased online and had overnighted to their homes. All told, colocation was down for about a day and a half.

The larger and more formidable of the two setbacks involved the company's tape and hard-drive backups. It was clear from the beginning that most of the company's paper-based customer records had fallen victim to Mother Nature, but four days after the tornado, Hart and Warren discovered that the electronic tape and hard-drive backups had failed as well. Hart finally uncovered the tape and hard drives May 8. When he pulled the tape from the rubble, it was so badly damaged that he hardly recognized it. Hart passed the hard drives on to a number of local data recovery specialists to see if they could retrieve anything. One by one, each came up empty.

Finally, as a last resort, Hart plucked the hard drives from four different nonfunctioning computers and turned them over to Kroll OnTrack, a data recovery company in Minneapolis. Miraculously, the vendor discovered a recent copy of the customer records database on all four computers and was able to recover all of the customer data and return it to Aeneas, delaying printing of its May bills only minimally.

Large organization, even larger plans

For an IT organization as small as Aeneas, the tornado presented sizable challenges. But for the IT organization of Harris County, Texas, which services more than 15,000 county employees and nearly 3.5 million constituents, the problems presented by Tropical Storm Allison were downright monumental.

Disaster struck June 6, 2001--the second day of a five-day storm--when atmospheric conditions caused a cloud to linger over the Houston area for nearly six hours, dropping more than 39 inches of rain. By the time the clouds parted, Harris County government had lost five buildings and most of the communications and other hardware and software in them to water damage. The price tag: a whopping $24 million.

Fortunately, though, Executive Director of Central Technology Steve Jennings had prepared for such an event. When Jennings joined county government in 1975, he established continuity planning to address natural disasters, such as flooding and hurricanes. The plan, which he dubbed the Four R strategy, hinges on four incremental steps--review, rewire, relocate and rebuild.

With this in mind, Jennings attacked the recovery immediately, following his plan like a bible. The morning after the deluge, he and his top advisers met to review assets and assess damages. Next, because Harris County is public and qualifies for federal aid, Jennings called in the Federal Emergency Management Agency (FEMA) to inspect the damage and lend him some disaster recovery expertise. He also brought in NetVersant Solutions to lay new fiber-optic cables. This process took approximately six weeks. In the meantime, Jennings reconvened his advisers, and put together an emergency relocation plan to disperse county employees into available office space on high, dry ground. Three months later, he tapped into the first of several batches of funding from FEMA to start rebuilding, spending millions on treating buildings for water damage.

Jennings also worked double time to ensure that county communications didn't miss a beat. "We utilized existing remote access facilities that allowed county employees to dial in from home until their new offices were finished," he says. This was done for employees whose jobs were deemed critical to county operations and for those for whom the county couldn't find alternative space. Jennings then mobilized a force of technicians to install high-speed connections at the homes of those employees who needed it most.

Finally, with the help of the county clerk's office, Jennings activated a cache of 300 Cingular cell phones, which had been reserved to help the blind vote on Election Day, and distributed them on an as-needed basis to county departments. "Those phones are deactivated for 11 months of the year, but they were available and we needed them," he says, noting that network administrators deactivated the phones and retrieved them once they managed to bring each department back online. "Part of recovering from a disaster is making use of everything you can find, and we did just that." When all was said and done, it took the county about a year to return to normal, which, according to Jennings, was pretty good given the scope of the damage.

Lessons learned

Jennings says the storm confirmed his belief that continuity plans should be flexible and horizontally applicable. Before the flood, Harris County's disaster recovery plan was conceived to respond to potentially any disaster, but it typically addressed single events such as the loss of a building, a network or a system. It was flexible enough, however, that it worked even when the county was faced with recovering multiple facilities. He adds that Harris County government "uses different portions of the plan for total recovery." Today, the Harris County continuity plan incorporates suggestions from employees who were part of the recovery process and lists scenarios for various "disaster combinations" that could occur during the next big storm--such as what to do if both the jail and family court gets hit. When that storm does happen, Jennings says he'll respond even faster than he did in 2001.

The next time a weather event occurs, Jennings says he'll also have the added benefit of wireless. After the flooding, as Jennings tried to rewire the Harris County jail, he spent $200,000 on Lynx high-definition wireless technology as an interim solution. The technology worked so well that he kept it and now has it on hand to pinch-hit during the next crisis. If, for example, a storm knocks out phone lines in the southeast corner of the county, Harris can set up wireless in hours. In addition, if another rainstorm waterlogs some of the underground fiber optics downtown, Harris can use the technology to provide emergency telephone service to anyone who needs it.

"Mother Nature never follows a script, especially not the one you wrote," Jennings quips. "As we have more experience recovering from the disasters she wields, we'll have a better sense of which remedies work best."

At Aeneas, Hart notes that from "now until the end of time," he'll keep an electronic records backup offsite to eliminate the problems he endured in recovering those mission-critical customer files. Planning for offsite backup had begun before the May tornado, and the site is now up and running in Memphis. Hart admits that his error in planning nearly cost Aeneas everything, adding that he'll never make that mistake again. Another misstep Hart says he'd correct is the way he handled the media in the days following the tornado. If he could do it all over again, Hart says, he would have been on the phone immediately with newspapers, TV stations and radio outlets to jump-start the company's PR campaign and assuage customer concerns.

"[Our customers] must have been watching the TV news thinking, 'Man, that's my ISP,' and we're too busy working on restoring systems to think about putting their minds at ease," he says. "Restoring technology after a disaster is important. But rebuilding customer confidence...it doesn't get more important than that."

Matt Villano is a freelance writer based in Moss Beach, Calif.

This story is reprinted from CIO.com, an online resource for information executives.
Story Copyright CXO Media Inc., 2003. All rights reserved.






Send feedback     Printer friendly     E-mail this     Request reprints






Sponsored Links

Manage Integrity. Get Control.    With Tripwire. Free poster.

There is measurable ROI for wireless LAN deployments.   Tune in to learn.

Microsoft® Windows® Server 2003   Free Evaluation Kit

Microsoft:   Get the latest news on Windows Server 2003 across all IDG sites

AMD:   Introduces the AMD Opteron ™ Processor

Get two FREE audio titles from Audible.    Click here!




Market Place Links

Corporate Email Security - NwTech, Inc - Choose among leading leading antispam, antivirus, and web filtering solutions for your organizations. We offer both trial software, and hardware evaluations for all of our solutions.

This is the Power of the Network. Now. - To discover how you can unleash the power of your network, click here

Enterprise Content Security from FutureSoft - Internet filtering, e-mail filtering and file surveillance with the DynaComm i:series product family from FutureSoft provides your organization with a complete enterprise content security solution. Learn more and download free trials today.

Get 128-bit SSL Web Site Encryption! - Looking for SSL? Turn to the SSL Experts! Protect your servers with 128-bit SSL encryption from VeriSign. Get a FREE Internet Security Guide Today. Click Here.

Network Administration: RemotelyAnywhere - Remotely control and administer your servers and PCs with just a Web browser (no client required). Free RA Console for easy network deployment and management. Also an ideal remote helpdesk solution. Download free fully functional 30-day trial.




  About Us Contacts Editorial Calendar Help Desk Advertise Privacy Policy Site Map

 

 


 
 
Copyright © 2003 Computerworld Inc. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of Computerworld Inc. is prohibited. Computerworld and Computerworld.com and the respective logos are trademarks of International Data Group Inc.