My window to the world


Skype outage caused by Windows Update? Yeah right...

By Mauricio Freitas, in , posted: 21-Aug-2007 09:05

Skype had a very long outage this week. Users around the world, for almost 48 hours, couldn't connect to the global telephony network that runs over Internet services.

Skype was updating its status through a blog, with nothing much more than "bear with us" messages.

People started thinking that hackers had infiltrated the network, bringing down essential servers and clients, making the restart harder.

And then comes the "official" Skype explanation for the outage, which makes no sense at all:


On Thursday, 16th August 2007, the Skype peer-to-peer network became unstable and suffered a critical disruption. The disruption was triggered by a massive restart of our users’ computers across the globe within a very short timeframe as they re-booted after receiving a routine set of patches through Windows Update.

The high number of restarts affected Skype’s network resources. This caused a flood of log-in requests, which, combined with the lack of peer-to-peer network resources, prompted a chain reaction that had a critical impact.

Normally Skype’s peer-to-peer network has an inbuilt ability to self-heal, however, this event revealed a previously unseen software bug within the network resource allocation algorithm which prevented the self-healing function from working quickly. Regrettably, as a result of this disruption, Skype was unavailable to the majority of its users for approximately two days.



Blame Microsoft Windows Update! Call the usual suspects!

But I say this is just some story Skype is seeding... Let's see why:

1.Windows Update by default runs at 3am local time. So even if all Windows-based PCs in the world would restart they would not restart all at the same time, but over a 24 hour "follow the sun" period. The entire Skype user based is spread over 24 time zones, not in a single time zone.

2.Windows Update is delivered every second Tuesday of the month, and has been for the last three years. Why it only happened now?

3.Windows Update starts on Tuesday, and counting the timezones, the last country to reach that time would be here in New Zealand, which happens to be Wednesday morning local time. If the problem happened Thursday as claimed by Skype, this was Friday morning in New Zealand, almost two days after the automatic Windows Update.

So, yes, I think the whole explanation doesn't work.

While a vast number of people use Skype for their PC-to-PC communications, some businesses are actually using the service to create a virtual presence in other markets. I wonder how much business was lost on a 48 hour outage for these companies? Will they trust Skype again?


UPDATE: Skype has posted a new blog entry with comments worth reading:


We don’t blame anyone but ourselves. The Microsoft Update patches were merely a catalyst — a trigger — for a series of events that led to the disruption of Skype, not the root cause of it. And Microsoft has been very helpful and supportive throughout.

The high number of post-update reboots affected Skype’s network resources. This caused a flood of log-in requests, which, combined with the lack of peer-to-peer network resources at the time, prompted a chain reaction that had a critical impact. The self-healing mechanisms of the P2P network upon which Skype’s software runs have worked well in the past. Simply put, every single time Skype has needed to recover from reboots that naturally accompany a routine Windows Update, there hasn’t been a problem.

Unfortunately, this time, for the first time, Skype was unable to rise to the challenge and the reasons for this were exceptional. In this instance, the day’s Skype traffic patterns, combined with the large number of reboots, revealed a previously unseen fault in the P2P network resource allocation algorithm Skype used. Consequently, the P2P network’s self-healing function didn’t work quickly enough. Skype’s peer-to-peer core was not properly tuned to cope with the load and core size changes that occurred on August 16. The reboots resulting from software patching merely served as a catalyst. This combination of factors created a situation where the self-healing needed outside intervention and assistance by our engineers.



Tag(s):     


Other related posts:
Microsoft Ignite New Zealand, Microsoft Surface Studio
Geekzone data analytics with Power BI
Now with more fibre






comments powered by Disqus

freitasm's profile

Mauricio Freitas
Wellington
New Zealand


I live in New Zealand and my interests include mobile devices, good books, movies and food of course! 

I'm the Geekzone admin. On Geekzone we publish news, reviews and articles on technology topics. The site also has some busy forums.

Subscribe now to my blog RSS feed or the Geekzone RSS feed.

If you want to contact me, please use this page or email me freitasm@geekzone.co.nz. Note this email is not for technical support. I don't give technical support. You can use our Geekzone Forums for community discussions on technical issues.

Here's is my full disclosure post.

A couple of blog posts you should read:

Social networks presence

View Mauricio Freitas's profile on LinkedIn


My Blog by tags...

Blog...
Entrepreneurship...
Media...
Personal...
State of Browsers...
Technology...
Viral Marketing...
Web Performance Optimization...
Windows...
Windows Phone...

Other recent posts in my blog

Microsoft Ignite New Zealand, ...
If the headlines indicate the ...
Geekzone data analytics with P...
State of browsers Geekzone Mar...
2Cheap Cars discussion...
Now with more fibre...
Unlimited is not unlimited: Vo...
How bad is Vodafone cable at t...
Frustrated with Microsoft Fami...
State of browsers Geekzone Mar...

New posts on Geekzone