My window to the world

The Geekzone outage post-mortem

By Mauricio Freitas, in , posted: 16-May-2008 19:46

Some of you may have noticed a (brief) Geekzone outage (and therefore affecting this blog as well), on Thursday 15th May.

"Brief" is just a way of saying it. The event happened from 9:15am through 3pm, at which time the site was up and running again.

What happened on that day? At 9:15am I removed one piece of software from the server and rebooted it. I didn't see the server up in under ninety seconds as usual, so I called ICONZ's help desk, which confirmed the server wouldn't boot, thanks to an error while loading the operating system.

The operating system was not happy with a change in drivers, and wouldn't boot without the original DVD for a recovery session.

And just this week I had contacted the ICONZ team asking if they had a library on site that could hold my recovery discs. But before I could send these up disaster happened.

While on the phone with the help desk I went to the Air NZ site and booked a seat on the 11am flight to Auckland - a 45 minutes flight, plus a 30 minutes taxi ride to the data center.

Because this server was originally installed from CD and all the OS since the original install were installed from disc files, we didn't have a DVD drive on this machine. I asked the ICONZ help desk to arrange for a DVD drive replacement to be installed and also asked them to commision a virtual server on their ICONZ virtual co-location platform, just in case things got worse. They arranged this while I was flying up to Auckland.

By 12:30pm I was on site and after evaluating my alternatives I decided to reinstall the OS instead of using the Symantec Backup Exec System Recovery image recovery. The reason for that is because the software that caused the problem would still be in that image, so it wouldn't be any good for me. Also, this server was running Windows Server 2008 Enterprise RTM as an update to Windows Server 2008 Enterprise RC 0, which was installed on Windows Server 2003 R2, which was installed on top of Windows Server 2003.

So a fresh install would make things easier - and cleaner.

Lucky the probem showed up during a restart, so the SQL database was perfectly safe, since the drives were still visible. Using the recovery console I copied the SQL database files to an external USB drive and loaded it on my test  machine to make sure it was all perfect.

I then proceeded to delete and recreate the partitions on this machine - a small problem because those were dynamic discs with software mirror enabled and it seems Windows Server 2008 can't manage this well during the setup. To get around this I used a Windows Server 2003 bootable DVD to delete and create fresh new partitions.

With this sorted, Windows Server 2008 installed very quickly. Installing SQL Server 2005 was not a problem, and soon I had the Geekzone site up and running.

The next steps were to reload some extra software on this machine, apply the Windows Updates (which required lots of reboots due to having SQL Server 2005), and configure the Symantec Backup Exec System Recovery on this newly built machine.

By 6pm I was in a cab going back to the airport to fly back to Wellington. I spent most of Friday doing the last bits of configuration required, testing the backup routines to make sure everything was ok and so on.

The great thing here is that we had no data loss at all. We also had a daily backup to an external drive, plus a weekly backup to an off-site location via FTP. And yes, I do test the backup contents to make sure they are valid (and so should you).

I've installed Symantec Backup Exec System Recovery to replace Acronis True Echo Server, as part of a review and I am really impressed. It's easier to manage and has a much better disc space management consolidation routine, including backup rotation, storage consolidation, etc. And it also allows conversion of image backups to both VMWARE and Virtual Server formats. I didn't have to use it because of the decision to rebuild this server, but it is now running full time as the main backup routine.

Next steps?

A complete image backup is done and scheduled for daily incremental backups. The daily SQL Server 2005 backup is running, as well as the transaction log switch. The weekly backup to the off-site location is back in place and all software is up-to-date.

I will be recreating the recovery DVD images from this new system and shipping those to ICONZ so that we can have a faster recovery next time - without perhaps having to have me flying there.

A big thanks to the folks at ICONZ who arranged the extra hardware and commissioned the virtual server - which we wound up not using at all.

Also thanks to all the Geekzone users who contacted me - a few voice calls, some SMS, inumerous Twitter messages and support from the people in our #geekzone IRC channel - including the offer to drive around Auckland to ferry any hardware if needed.

Other related posts:
Microsoft Ignite New Zealand, Microsoft Surface Studio
Geekzone data analytics with Power BI
Now with more fibre






comments powered by Disqus

freitasm's profile

Mauricio Freitas
Wellington
New Zealand


I live in New Zealand and my interests include mobile devices, good books, movies and food of course! 

I'm the Geekzone admin. On Geekzone we publish news, reviews and articles on technology topics. The site also has some busy forums. Also worth visiting is TravelTalk NZ, a community for travelers!

Subscribe now to my blog RSS feed or the Geekzone RSS feed.

If you want to contact me, please use this page or email me freitasm@geekzone.co.nz. Note this email is not for technical support. I don't give technical support. You can use our Geekzone Forums for community discussions on technical issues.

Here's is my full disclosure post.

A couple of blog posts you should read:


Social networks presence

View Mauricio Freitas's profile on LinkedIn


My Blog by tags...

Blog...
Entrepreneurship...
Media...
Personal...
State of Browsers...
Technology...
Viral Marketing...
Web Performance Optimization...
Windows...
Windows Phone...

Other recent posts in my blog

Google crawling Geekzone HTTPS...
Geekzone gone full HTTPS...
Microsoft Ignite New Zealand, ...
If the headlines indicate the ...
Geekzone data analytics with P...
State of browsers Geekzone Mar...
2Cheap Cars discussion...
Now with more fibre...
Unlimited is not unlimited: Vo...
How bad is Vodafone cable at t...

New posts on Geekzone