At 4:35am on Saturday 4th September Canterbury was stuck by New Zealand’s most destructive earthquake since the 1931 Napier Earthquake (GNS Press Release or follow #tenz on Twitter). Many of the cities older buildings and were either totally destroyed or suffered significant structural damage. The Central Business District was particularly hard hit with many buildings unsafe. Widespread power outages, loss of water and a broken sewage also affected a large part of the city.
A week earlier I had hosted a session on Disaster Recovery at Code Camp in Auckland. Time to put those words into practice and help get our clients up and running again. Our clients are largely concentrated in Christchurch and surrounding towns.
Scale of the problem:
Extended electricity outages meant that UPS’s at most of our client sites initiated graceful shutdowns and so powering on again should be a relatively straight forward process. The violent shaking could have resulted in disk failure as well as unsecured items moving. It is also possible that building damage or broken water pipes could affect computer systems. In some cases buildings are inaccessible or are still without power.
Communication failure:
We were very fortunate that both the Telecom and Vodafone mobile networks were available in a large part of the city allowing communication. Twitter was particularly lively until the message spread that batteries in Cell towers were running low and mobile phone use should be minimised. National Radio should be commended for providing an excellent service, if only most people had a radio with batteries still!
Once electricity was restored in some areas of the city the team of IT Pro’s I work with got cracking testing VPN’s to identify which client sites were up. This was a particularly effective way to identify who needed help and set some priorities.
Direct communication with clients via phone is best in this type of situation as many other communication methods are not reliable. A good thing most of us have client contact numbers stored in our mobiles (and our phones were charged).
Hardware failure:
One of the key points I raised during my Disaster Recovery presentation was that Virtualisation is your friend in a disaster. I was unlucky enough to have one client with a critical server that would not boot (no BIOS screen). Fortunately the site had existing virtual infrastructure with enough disk space to migrate the server. Rather than waiting to resolve the issue using the warranty process I decided that a P2V (Physical to Virtual migration) was the best method. 1.5 hours later the system had been migrated and they were back in business. To do this I needed a second server with the same hardware and then simply swap the disks over to the working server, booted and ran the P2V process. Once started a couple of tweaks to the network settings and everything was running happily. The Server warranty was next business day, so the result here was much better than the client would have received if that path had been followed.
Problems:
While the vast majority of our client sites simply required power to kick back into life, we did strike a few issues. A short list of things we found on the first day back in the office:
- Servers and racks moved
- Blown circuit breakers
- Dismounted Exchange Information Store
- HyperV host server that would not start Virtual Machines
- Domain Controller that shutdown part way through the boot process
- Various cabling issues (due to movement)
- Printer issues
- PC’s that had moved (pulling out cables)
All of these issues were solved today and I am sure a few lessons were learnt. I think we will find that this is really just the beginning as many clients are still not in operating and others are have asked staff to stay away for a couple of days.
Team work:
IT Pro’s all seem to share a common attribute. We respond. IT Pro’s will drop everything often with little notice or regard for other personal circumstances to respond to clients in need. I witnessed this first hand from the guys I work with. We are a self-organising bunch of guys and it really showed over the past couple of days. I am sure this isn’t limited to my team, but to IT Pro’s all over the Canterbury region. Keep up the good work everyone!
Great work Steve & Team…Love the IT Pro common attribute comments as well.
Brent
IT Pro’s ruleeeeeeeee LOL good post Steveo!
i like the way you simply explain the outline of what happened and what problems you found and fixed.
So glad that no one was killed, i reckon it would be a different story if it happened in Wellington