Facebook Outage: A single-point-of-failure lesson for Ghana
The recent six-hour-long Facebook global outage was not the first, neither was it the longest. In 2019, Facebook suffered a 14-hour-long disruption that affected Facebook and Instagram users globally. Indeed, prior to that, there had been other such outages that were on a smaller scale.
What is however unique about this current one is that two other global social media platforms – WhatsApp and Instagram were also affected. This is because, through acquisitions and technical convergence, Facebook has created a single point of failure, and that has become a big source of worry for businesses around the world, and for industry regulators in the USA.
A day after the outage Facebook issued a statement telling the world that the outage was caused “a faulty configuration change“.
This is exactly what the Facebook statement said:
“Our engineering teams have learned that configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication. This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt.”
In short, they have linked/merged all their data center for Facebook, Facebook Messenger, WhatsApp and Instagram, and so data transfer between the data centers was hampered in the process and that affected all their platforms.
The magnitude of the outage was that a whopping 3.5 billion users and businesses on its platforms were affected. Outage tracking firm, Downdetector reported that there were over 10.6 million problem reports globally. Indeed, Facebook itself admitted that the outage affected the emails and access accounts of its own staff, making it difficult for them to even enter the system and fix the problem on time.
Facebook Founder and CEO, Mark Zuckerberg reportedly lost a whopping US$6 billion due to Facebook share plummet as result of the outage. And small businesses and influencers who use Facebook platforms for their trade also reportedly lost at least US$5,000 each on the average.
As stated above, this is not the first and biggest outage on Facebook. But its significance lies in the fact that now Facebook has created a humungous single point of failure; which means a hitch at one point on its system will affect not just the Facebook platform, but also WhatsApp and Instagram. The current outage was a clear example.
Convergence, at all levels, has become a popular strategy for business lately. And the reasons are for cost-cutting and effective management from a single point rather than from multiple points. But it also creates a challenge where one small problem, which was hitherto localized and could only cause a limited impact, now becomes a global challenge.
Globally, there are almost 3 billion Facebook accounts, plus over 2.5 billion WhatsApp accounts and 1.1 billion Instagram accounts. If all these platforms had their separate data centers managed separately, any challenge will still be big, but at least it would be localized to the specific platform. But the single point of failure at Facebook means all of these over 6 billion accounts are at a risk.
So, whereas Facebook has strategically merged its platforms for effective management and cost-cutting, industry regulators, particularly in the USA, are now getting worried about the risk a gargantuan single point of failure like the one at Facebook poses for businesses in particular. It is estimated that some 200 million big businesses in the USA run a greater part of the business on Facebook’s platforms and such outages is a huge risk to their financial and other important data.
Read: Kenya: Digital lenders risk ban for revealing borrowers
ECG
In Ghana, we have a number of single points of failure and it has been affecting the economy since independence, and yet we do not seem to care much about changing the status quo. What we are rather doing is creating more single points of failure.
The Electricity Company of Ghana (ECG) is a classic single point of failure in Ghana. They are the only power distributor in the country. Every power generator would have to go through ECG to get to the consumer. That monopoly has been one of the challenges to our development, yet in the wisdom of successive governments, that is the way to go, for several reasons, including the need to prevent privatization of power distribution and its cost implications for the final consumer.
So we have had to live with the impact of failings of ECG for all these decades. Once there is a small challenge at ECG, it does not matter how much power has been generated and is available for distribution, we all live in darkness when there is a fix at ECG. This has caused lots of businesses huge amounts of money, and many homes have also suffered damages to electrical appliances without any compensation.
GhIPSS
Another single point of failure in Ghana is the Ghana Interbank Payments and Settlements Systems (GhIPSS), which is a clearinghouse for all interbank and currently, digital finance interoperability transactions. GhIPSS is the only institution that sits between banks, electronic money issuers (mobile money operators) and fintechs, to ensure that all cross-platform transactions are seamless.
The risk however is, once there is a challenge at GhIPSS, all the cross-platform transactions at the backend of check clearance, mobile wallet to wallet, wallet to the bank, bank to wallet and others, will come to a halt for as long as the problem remains. Facebook’s challenge lasted for six hours, and outages could take even longer.
ICH
The other single point of failure in Ghana is the Telecoms Interconnect Clearinghouse (ICH), which was established to replace what became popularly known as the “spaghetti” interconnect arrangements between telcos. What used to pertain was each of the then five telcos in Ghana had separate interconnect infrastructure to each other. So, one telco has four separate connections to each of the other telcos and also to each international gateway and others within the ecosystem. It was “a mess” as regulators put it, even though telcos insisted that “mess” was working effectively.
What the ICH has done is to host a data center that connects all telcos and international gateways at one point for their interconnect traffic to flow through a single point. That way, when they go for reconciliation, the reference data is readily available at a single point, the ICH. This is a good thing, but could also create problems. When there is a challenge at the ICH, it means called from one network to the other may not even get through. Again, there can be far-reaching reconciliation problems if the ICH develops a fault. Hitherto, such challenges would have been localized between individual telcos.
There may be other single points of failure in the country. But let’s stay with these three.
Two points are also worth noting in the Facebook example.
- In Facebook’s own statement, they did not rule out possible internal sabotage. So what it means is even people working in an organization that runs a single point of failure could intentionally tamper with the systems for whatever purpose, and end up creating problems for an entire country or the whole world.
- Regulators observed that even though Facebooks stands out as some tech giant with all the infrastructure, tools and skilled personnel to prevent and or manage such challenges, the over six-hour global outage exposed a certain weakness at Facebook.
In the light of the foregoing, the single points of failure in Ghana should be on high security. Invest heavily into security and cybersecurity to ward of saboteurs. Secondly, they need to keep updating their systems and infrastructure regularly to prevent a situation where emerging challenges/innovations outpace the systems at the single point of failure. That cannot happen. That would not be acceptable, particularly given what we have all seen happen with Facebook.
It is also important for industry players and regulators to continue to work as partners to prevent unforeseen challenges. Expertise in managing this space are not only within the remit of regulators or players. There are resources across the ecosystem, which can be tapped to ensure the greater good of the industry and the country.
For the ordinary Ghanaian on Facebook, WhatsApp and Instagram – this should be a lesson that life is bigger than social media. It is good to link up with friends and family on social media and also good to reach out to a global market via social media. Indeed, Covid has thought us that we do not need a physical space to connect. But social media can also do to us what Facebook, WhatsApp and Instagram did not us recently. The simple message is that get a life outside of social media.