Until recently, if you wanted to talk to someone, you picked up a phone. It could be a mobile phone or a landline, but it looked like a phone and used fairly standard telephony technologies to place the call.
Now we have more ways than ever to communicate. In addition, we have become very accustomed to a world in which we are always reachable by a variety of modalities, constantly connected to something. This has made it so that I am surprised if I actually hear a busy signal. I also receive fewer voicemail messages because people who need to reach me usually do reach me on the first try.
If you tried to make a phone call on August 16 and couldn’t, you weren’t alone. Skype, which today (I’m writing this on Wednesday, 29 August) celebrates its fourth birthday, went dark for several days. Granted, Skype is not an enterprise tool. But many of the 220 million Skype users around the world work in large organizations. Moreover, 30% of Skype users have told Skype that they use Skype at least in part for work-related calls.
Some users have even ditched their landlines and use Skype as their office (or personal) phone system. Those users probably never thought that Skype would be unavailable to them. Moreover, Skype is not intended as a replacement for a landline (something the folks at Skype have reiterated over and over) yet some knowledge workers see Skype as being good enough for what they need.
It turns out that “good enough” may not always be good enough.
In the post Second World War era, dial tone was something that more and more people took for granted as penetration actually became over 100% as more than a few households added multiple phone lines. People in the telecoms and IT industries began using the term “five-nines” to describe dial tone availability. People think of this as being equivalent to “uptime”. But I would go so far as to hazard a guess that 99.999% really don’t know what it means and even fewer understand where it’s applicable. The term originated with AT&T’s #1ESS switch. The switch’s development team’s goal was to have less than one day of outage in 40 years. Needless to say, it met this goal admirably.
When we speak of five-nines, however, we generally talking more about availability than reliability. Availability is a function of Mean Time Between Failures (MTBF) and Mean Time To Repair (MTTR). Today we might think of MTTR as Mean Time To Restore, given our predilection for swapping out faulty components instead of making a repair on the spot. These factors are usually expressed in hours. They can be an excellent measure of reliability but they don’t tell you everything you need to know. As the maker of the Concorde found out, a machine can enjoy a lifetime of reliability and still suffer a disaster, in this case causing the aircraft after one crash to go from having the best safety record to having the worst. Its zero per million flights fatal accident rate suddenly became 12.5 per million flights, more than three times the rate of the airplane in second place (source: www.airsafe.com).
As we rely more and more on technologies that aren’t quite ready for prime time in our quest to remain connected, the question becomes “how do we strike the appropriate balance?” Where does “always on” leave off and “good enough” start? As more and more knowledge workers decide to rely upon online tools, some with limited offline support, this question will come up more often. This of course entails a certain amount of risk. On July 24, a power outage in San Francisco brought down a variety of Web sites frequented by knowledge workers, including Craigslist and TypePad, for several hours.
On Thursday, August 16, the Skype peer-to-peer network in Skype’s words “became unstable and suffered a critical disruption.” The disruption was caused by a “massive restart of our users’ computers across the globe within a short time frame.” The rebooting, Skype explains, was due to a set of patches through Windows Update.
Skype was unavailable for almost all users for almost two days. The high number of restarts impacted Skype’s network resources and resulted in the lack of network availability. Skype later cited a “perfect storm” and took pains to avoid placing blame on Microsoft.
So what is the appropriate level of risk? That is something for each individual or company to determine. We are so used to ubiquitous connectivity and communications that the miracle of how much we do have goes unnoticed. Years ago, when mobile phone coverage was spotty, it became common place to become disconnected in a tunnel. Now we can’t even use that as excuse to end a call early. But whenever one steps away from a landline, still the gold standard of availability, one assumes that risk.
David M. Goldes is the president of Basex.