We experienced an outage yesterday, September 26, that affected all Zoho services. The outage started around 2:32 pm, PST. This was due to the network disturbance caused by a misbehaving access switch in our primary datacenter LAN. This switch actually has redundancy built in. The switch was losing packets, but didn’t fail fully, and so the backup switch didn’t take over.
Our team took the troublesome switch off the network, and had the backup switch take over. Most of the Zoho services including Zoho CRM which were hosted in a different network were up within the first hour. Actual downtime of Zoho CRM and other services was 52 minutes. Zoho Mail, which was hosted in the same network as the failed switch, was the most affected, and it took around three hours to restore full service.
We are still analyzing the root cause of this issue, and we will post our observations, corrective actions, as we get more insights into the events that led to the outage.
Any downtime is painful, and we are investing in both infrastructure and R&D to avoid downtime. We apologize for letting you down yesterday.