At last, the folks at Skype have provided us with a half-decent explanation of what happened when the peer-to-peer telephone service went dark for almost two full days last week. Unfortunately for Skype, it’s
not a very favourable one. The company does its best to
blame the service outage on Microsoft, saying the disruption was triggered by a massive wave of restarts by users whose computers had downloaded routine updates from Microsoft:
“The disruption was triggered by a massive restart of our users’ computers across the globe within a very short timeframe as they re-booted after receiving a routine set of patches through Windows Update. The high number of restarts affected Skype’s network resources.”
But the real culprit seems to be the company’s own software, which handles the provisioning of services across millions of individual PCs. Apparently the simultaneous restarts led to a wave of login requests and that — combined with a flaw in Skype’s network-management software — caused the failure:
“This caused a flood of log-in requests, which, combined with the lack of peer-to-peer network resources, prompted a chain reaction that had a critical impact.
Normally Skype’s peer-to-peer network has an inbuilt ability to self-heal, however, this event revealed a previously unseen software bug within the network resource allocation algorithm which prevented the self-healing function from working quickly.”
The chief technology officer of SightSpeed argues that the event Skype experienced shows the flaws in its P2P network structure. Instead of relying on its own servers, Skype’s network uses some of its users’ individual PCs as “SuperNodes” to handle the traffic flow of data. The loss of any significant number of those SuperNodes, he argues, can cause a substantial disruption.
It should be noted that SightSpeed — which uses a P2P network structure with central servers instead of SuperNodes — is a competitor of Skype’s, and is offering any disgruntled Skype users a special trial of its premium services. And as one commenter on the post notes, SightSpeed’s model is also far from immune to outages, and arguably less robust because it depends on the company’s servers alone to handle traffic.
Nevertheless, the outage has no doubt caused more than one Skype user to wonder about the network that the service is based on. There is a comment <a href="http://gigaom.com/2007/08/16/skype-groans-sipphone-gains/#comment-456134“>on one of Om Malik’s posts that appears to be from someone with knowledge of the Skype SuperNode problem.