Update on our downtime

General | January 20, 2012 | 2 min read

This morning, there was a power failure in our California data center – Equinix – taking all Zoho services down. As our services went down, it down all services, this means it caused database inconsistencies.

Services have multiple database clusters – like 8-12 database clusters per each service. Each cluster having a Master-Slave combination. Power failure caused inconsistencies in some of these clusters. But to restore these services fully, we need to make sure all these database clusters are consistent. This database consistency check and sync is what is taking time.

Realistically, it is going to take few hours to restore our services after making sure the databases are in sync. Some services will be restored sooner based on the level of inconsistency.

We sincerely apologize for this issue. Will keep you updated.

Update (1:35PM): Zoho Mail is partially restored. Users in affected clusters will be able to access it after we restore that cluster. For now, most Zoho Mail users should be able to access Zoho Mail.

Also, please note that Zoho Support is also Restored.

Update (1:40PM): Zoho Reports is now restored.

Update (1:48PM): Zoho Writer is now restored.

Update (2:25PM): Zoho Calendar & Contacts are restored. We still are working on fully restoring Writer. Will keep you posted.

Update (2:55PM): Zoho Writer is back up. Zoho Discussions & Zoho Books are restored.

Update (3:15PM): About 60% of database clusters have been restored currently for Zoho CRM. We will be making these available first. Users on these clusters will be able to access their data. As and when other clusters are restored, we will bring them online.

Update (3:50PM): Zoho CRM is partially restored. Around 80% of users should have access to their data. Other 20% will be restored soon. Will keep you updated.

Update (4:00PM): Zoho Invoice is restored.

So far, the following services are restored: Zoho CRM, Books, Invoice, Mail, Support, Reports, Discussions, Calendar & Writer are restored. Others will follow soon.

Update (4:20PM): Zoho Projects and Wiki are currently restored.

Update (4:28PM): Zoho Recruit & People are now restored.

Update (4:35PM): Zoho CRM is fully restored. Zoho Show & Bug Tracker are also restored. Zoho Docs is partially restored.

Update (4:45PM): Zoho Mobile service is now restored.

Update (4:52PM): Zoho Sheet & Docs are partially restored

Update (5:35PM): Zoho Chat & Share are currently restored.

Update (5:56PM): Zoho Creator is now restored.

Update (6:12PM): Zoho Meeting is now restored.

All Zoho Applications are currently restored. There are a few backend services currently being restored to full service.

We will do a detailed blog post with the postmortem report. Again, apologize for the outage.

 

  1. Moo Kahn

    Note to self: Do not host servers at Equinix.
    What I don’t understand – your data center had backup generators – no ? And the servers had working UPS on them, no? And you have an SLA with them, no? If yes, then how could there be a complete power outage? Sounds to me like Equinix did not honor their service level agreement – I’d be suing the pants off the data center and then finding a professional data center to work with. This little fiasco has cost Zoho dearly on so many levels. You need to take action on behalf of your customers so it does not happen again.

  2. chris

    Is anyone else having trouble with the search function? When I search for a clients name it will not bring back the clients file.I would not say we are fully restored.

  3. Lhearry

    Thanks Raju! Just logged in and everything is working fine 🙂 thanks for your quick updates and getting it back up & running as soon as possible. Leanne

  4. MTurk

    I agree 100%… I will take my business elsewhere… We need reliable solutions with disaster backup plan..

  5. Raju Vegesna

    Creator is now restored.

  6. Raju Vegesna

    Creator is now fully restored.

  7. KB

    Thanks for update.

  8. BigD

    Goto http://www.Caspio.com for a better online database. They have never gone down, only their competitors get shut out of business.

  9. Lhearry

    Great thanks for the updates, its much appreciated 🙂

  10. Raju Vegesna

    We are working on restoring Creator. There will not be any data loss.

  11. Lhearry

    Thanks Raju for the update. Do you know if there are any concerns re: data loss for Creator users..?

  12. Raju Vegesna

    Currently, Creator is our top most priority. We are working on it. Creator has a large database which is what is taking time.

  13. Raju Vegesna

    The data center (Equinix) has a primary UPS backup and then a generator as the secondary backup. It looks like the system that automatically fails-over to these backups went down causing system wide outage.

  14. Lhearry

    Just wondering if you have an ETA on CREATOR being up & running again & is there any concern of data loss…?!

  15. KB

    Estimated time for Creator back up?

  16. Mr. Saugata Chakrabarti

    I am sorry to hear this.
    As I am following the blog, not yet able to get the email up for me & my org.
    Is it possible to have a notification call to us once the required services are really up?
    And, we would like have a technical & operational details of this failure and would be happy to discuss your plan of action to prevent such things in future.As I understand, it is your sync dB issue, and might not help even to have multilocation hostings. Nevertheless, it might be worthwhile to explore possibility to host/co-host in India please. Thanks.

  17. ct

    What is missing?

  18. Raju Vegesna

    CRM is partially restore currently.

  19. Raju Vegesna

    @GKThe conflict resolution in databases is what is taking time. Now CRM is partially up.

  20. ct

    Thanks. I’m sure its a lot of data. Would think with a million userts you would choose a data center with a generator.

  21. Annette

    Any news on zoho crm?

  22. Juergen

    Nope … no access to CRM.

  23. ct

    Does anyone have access to crm?

  24. GK

    LMAO… They have been busy today…

  25. GK

    I agree Rich, a power outage and database synchronization should definitely not take 8 hours to recover from. There has to be other issues going on here.

  26. GK

    Hmmm. Maybe I am missing the point. I just speaking directly to contact information in CRM. You could have an offline source if you needed one by using SQL express and importing your backup data into your own database. Now the front end application is a different story but at the very least you would have access to tables of data for your customer information. Of course your work flow would slow down significantly but I was speaking to the post where a business came to a stand still because they can’t access customer information. I am not speaking about email, that is a whole different animal and a plan for that would need to be formulated, but then again I do not use Zoho for email, only Customer Information like names, phone numbers and addresses with notes. I am not defending Zoho, or Equinex as I think this issue is taking way longer to correct than it should, I’m just saying you should have your own DRP for when something like this does happen. Regards,

  27. Drew

    Lol!

  28. JJ

    You are missing the point GK. Sure, we have all the data backed up and can go off of spreadsheets but do you think the workflow stays the same, especially when you know that you have to re-enter this data in the main repository. DRP when you are vendor dependent is different from when you house your own technology. Example: is it is easy to DRP for an Exchange Failure, but not for a proprietary database service when there are NO offline options. It is clear that you have never been in that situation, and if you think a company can run off of single data extract once a month, then you are badly mistaken. Shame on you my friend.

  29. Leo

    Last time I looked into it, it wasn’t an option, is it now? I’ll have to look into that if it ever gets back up and running…Thanks

  30. KVA

    Maybe it was Anonymous

  31. GK

    A plan for that would be using Google calendars as a backup.. Its working for me =)
    and its free…

  32. John

    I wanted to bring one thing to your attention.Update (1:48PM): Zoho Writer is now restored.Update (2:25PM): Zoho Calendar & Contacts are restored. We still are working on fully restoring Writer. Will keep you posted.The 2.25pm Update contradicts the 1.48pm Update. Obviously Zoho Writer was not restored or you would not be working on restoring it.Please correct this if possible. It does reflect poorly on your PR end.That being said, I hope this issue is resolved quickly. For your benefit, as well as your customers.

  33. Leo

    You’re correct partially for me. But the calendar is in Zoho and is a “live” document, it wouldn’t help to extract it monthly or even weekly. So even if we had the contacts it wouldn’t help to book or confirm appointments if we don’t know when they are…

  34. Rich

    This is completely ridiculous…I own a IT company and my clients are better protected than your are. I find it ludicrous in this day and age that your data center would be affected by a power outage… I had proposals to get done today and had to move meetings into next week… which simply pushes projects back and I don’t get paid as soon… thank you ZOHO…. I am going to have to look at different solutions… unacceptable….
    Rich

  35. KVA

    Thank you ZOHO for the misleading updates all day.

  36. RG

    What is the ETA for CRM to be up? We’ve lost a whole day of work without having customer data…

  37. Slade

    Unacceptable is the right word. I have 12 people using an enterprise solution through Zoho CRM. Sales and operations is down because of this lack of competence. I calculate a $36,000 loss of productivity today due to Zoho. PLEASE Contact me directly so you can credit my account for the next 10 years!

  38. NoLongerGoing2UseZoho

    And they’re also using WordPress for this Forum!? Jesus…..

  39. Raju Vegesna

    Our colocation provider says the computer that manages the switch over has failed. They do have multiple backups. This was unfortunate. While the power was restored quickly, the instant shutdown caused sync issues across our database clusters. We are doing consistency checks before we bring services back up, which is what is taking time and hence the delays.

  40. NoLongerGoing2UseZoho

    I’m with @CM, Geo Mirroring isn’t in place or this wouldn’t be an issue. We have all of our client data in Zoho CRM and rely on this to be up 24/7 like our local File/Mail server.

  41. GK

    Its funny reading all these posts about DRP and how many businesses rely on Zoho’s CRM only. Do you not have a DRP of your own? I have extracts of all of our customers data in both excel and csv format. Something to consider if businesses run into this issue again. I understand the frustrations of businesses and a service you pay for not being available but to rely on only one source is a mistake that you are making as a business by not having your own DR plan for issues like this. Extracting the data once a month would probably alleviate most of these concerns in most of these posts and would not take that long to do. Shame on you…. for allowing one single service to bring your business to a screeching halt.

  42. GB

    Do you have employees? I have 12. And when I pay their paychecks, it’s with the expectation that they work. We pay for zoho, have dealt with a slow site multiple times, but this is the breaking point. It has cost us more today than salesforce will cost for 6 months. What’s disappointing is how much time and energy we’ve spent getting the zoho CRM tweaked for our business.If you told a client 12 times you were going to do something asap, and you didn’t do it, would you expect them to still be a client?

  43. shopping cart

    What I don’t understand is how does a data center have a power outage and not have backup power. I think something else happened and they are restoring their backups. Our server clusters boots right up on reboot and works in 5 minutes. Longest would take is 1 hour. Unless its something more serious.

  44. Martha Peterson

    That is simply not true. Forty minutes after its “restoration” I have no access to email. And Zoho’s own uptime statistics show that NOTHING has been restored.

  45. Leo

    Have been paying for years.
    Leo

  46. Leo

    This is totally unacceptable, I am sending one of my sales people to LA next week and she can’t get to her client list to confirm the appointments. The other is going to NY next week and her assistant can’t book her appointments because the calendar is tied up in Zoho CRM! This down time is causing serious monetary damage here, way more than the cost of my subscription. We need some guarantees that this doesn’t happen again EVER! Depending on how things work out next week, today’s disruption would cost me more than a year of salesforce for my small sales team of 4, something for me to consider…

  47. Ryan C

    Thank you for the update ZoHo. I wonder how many of the complainers actually pay for service. As a free user, I will patiently wait for you to do your thing as I understand things like this frequently happen and are out of your control. Happy Friday.Ryan C

  48. Chris Lasso

    This unacceptable!!!I have been down for 6 hours now with the promise of Zoho CRM going to be back up and running every half hour that I call…We believed in Zoho and have become totally reliant on Zoho CRM for every aspect of our business. This problem has brought my company to a complete stand still for hours now. We have missed many appointments and lost many opportunities with potential customers. There is no excuse as to why a company like Zoho does not have dual home or backup power programs in place. I want to know what you are going to do to reimburse my company for the thousands of dollars I have lost due to this problem. I would also like a reason as to why our company should take another chance with Zoho and what Zoho is going to do different in the future so that such a problem is not possible in the future.

  49. Juergen

    What about CRM? When is that being restored again?

  50. Gordon

    Can you image the “Support” call center now? Ha. Sanjiv just went on break….we can’t find him anywhere…..

  51. CM

    To err is human, but to be irresponsible in protecting millions of customers is shameful. Your website says the following: “Geo Mirroring. Customer data is mirrored in a separate geographic location for Disaster Recovery and Business Continuity purposes. Please note geo mirroring is available on select products and plans.”We are a premium Mail Suite/CRM/Projects/Books customer and none of these applications have worked all day. Please clarify for all of us which products have geo mirroring or any type of data back-up or protection. You owe it to your customers to honestly explain how are data is protected (and to what extent) if we decide to continue with ZOHO. We really like using the ZOHO suite of products and sincerely hope you can address our concerns. Thank you.

  52. Raju Vegesna

    @wdc
    We started restoring services. Zoho Mail, Support & Writer are restored currently. Others will follow. CRM is a priority.

  53. Robert Smith

    Zoho, your customers should never have known about a power failure. The fact that they do know, and worse, that it was allowed to deny services for any length of time is inexcusable! This is pathetic and should cost several person’s jobs. This has not simply inconvenienced your customers. It has caused monetary damages to those who rely on your services for their businesses!
    Zoho, your decision not to plan for, implement and routinely test backup and fail-over procedures represents blatant contempt towards your customers. I will definitely be looking for alternatives.

  54. wdc

    I’m floored. An entire day lost. Do we have a NEW ETA? This does not give me a good feeling about migrating my entire company to this platform.

  55. The Cloud Coders

    I would hate to think what would happen if there was a major earthquake in So Cal. I am amazed that you do not have multiple availability zones in separate geo-regions with failover at the DNS level. This is unheard of in today’s Cloud Computing environments. Even basic (start-up) websites and services are prepared for this. It sounds like cutting corners to keep costs down is part of your business plan. With business services as critical as CRM, Email, Project Management, Invoicing, you have virtually STOPPED business for so many people where every minute costs money. I cannot believe that you could have let this happen. The poor planning makes me wonder what else was not well thought out in your platform and services. ZoHo Trust = zero. You get what you pay for, or maybe a little less than you paid for in this case. ZoNo….

  56. Joe

    This sort of things should never happen but then again, who really expects a Tier 1 colocation provider like Equinex to lose power (including backup) at once and without warning. It reminds back to several months ago when Amazon’s S3 had a major outage. Equinex is a state of the art facility which comes at a premium, if anything it reflects poorly on them for not living up the the SLA’s. I am a Zoho customer and it’s really unfortunate that we have to end the week without our activities, but I understand the issue so I can’t be too upset. Sure, a mirrored site on the East Coast would be great for times like these, but there’s a reason why Zoho is 70% cheaper than Salesforce… just saying.

  57. Mark Stone

    If the generator doesn’t start, it doesn’t matter how many gallons of fuel you have. Happened to us, so we moved to a different data center provider.An inexpensive “solution” for this specific issue is to deploy managed in-rack UPS to support the database cluster servers. If power to the rack is lost and not restored in a timely basis, the in-rack UPS will perform a graceful shutdown of the entire database cluster.Deploying near-real-time WAN replication of busy database clusters is neither trivial nor inexpensive; would you pay double what you pay Zoho now for the benefit of geographic redundancy?Perhaps giving customers a choice, for an additional price, to be hosted on a geographically diverse Zoho cluster versus the single-site cluster would empower customers to choose for their own how much “insurance” they want to buy. For us, we would not, but we recognize that CRM for us is much less mission-critical than it is for others.Hope that helps,
    Mark

  58. Dave

    I am on Mark Stones side. Take the extra time to make sure the databases are re-indexed correctly.We have web-based software housed in servers at a data center. This software manages and records video from remotely located IP cameras. Video as a Service. Should a hard server shut down occur, it is vital to make sure that the tables are re-indexed correct to prevent any data loss. I am sure that you folks are are working as fast as possible to restore service.

  59. Jim

    Yeah I wonder if they will give me some fries with that 🙂

  60. GD

    Quite inconvenient, but the Twitter & Blog updates are highly appreciated. I look forward to reading your full report later on what exactly happened.

  61. Juergen

    What’s the latest “guess” about when ZOHO will be up again?

  62. Daniel

    This is unacceptable, disrupting business this way for so long is unacceptable for a paid service.

  63. Josh Parker

    Time to re-balance your manpower. Less PR, more basic backbone support. I’ve recommended Zoho to many who were looking at salesforce.com. Needless to say, I’ve been personally receiving emails blasting me for telling them about Zoho. While problems occur with all of our businesses, we should be well prepared for those we view as possible or likely. For a cloud application provider, it would seem insuring up-time and managing a data center going down in any variation would be at the top of the list. I hope you don’t mistake the discontent and confidence hit this has brought. And if people want to leave, you best make it easy to do so. If people start complaining they can’t get their data if they want to go, it will simply cause a deeper confidence crisis. And without question – you should look to generous credits to help appease the disgust and discontent.We’re all watching and hoping this is a huge lesson that propels you to be a far better service-oriented company. And not one that underestimates the importance of reliability, stability and candid and timely service.

  64. Wagner

    any plans to return the systems?

  65. AG

    Todays outage clearly shows that mirroring is not FAULT TOLERANCE or a Disaster Recovery Strategy. It is in these exact situations that we must realize that all backup solutions are truly RESTORE solutions.
    It is not just about protecting data, but having the ability to provide FAULT TOLERANCE, DATA REPLICATION and SITE FAILOVER. It is about understanding and protecting all points of failure and having a strategy in place to recover.
    We are housed in a Data Center that offers 7 points of communication and a 50,000 Gallon container of Diesel fuel to avoid any power outages.

  66. curt

    salesforce

  67. GK

    I just switched over from salesForce.com‘s CRM system. If you have the available funding I highly recommend them for CRM. This is my first bad experience with Zoho. I hope I did not make a mistake in migrating to Zoho. Regards,

  68. ZeroEfusion

    Yea this really sux because I was expecting an email back today for a job position. But I understand this isn’t entirely Zoho’s fault. I am trying to figure out why Equanix did not have layers of UPS boxes for their data servers. Maybe they did and those failed too or maybe I am just clueless about how this all works. Anyway I just feel bad for all those companies that rely on Zoho’s service. A lot of productivity (money) can be lost in one business day due to something like this. Thank God its the weekend. But I think Zoho is doing good updating us on the situation. I guess I shall patiently hope that I can check my email tommorow.

  69. Kendra

    Kent – I’ve been with Zoho for 3 years now and this is the FIRST time I have ever experienced down time with them. Hang in there!

  70. skb

    I’m thinking a healthy refund or credit is in order to prevent switching a more reliable service.

  71. JJ

    In this enviroment for cloud/hosting based services, it is almost unbeliveable that you do not have redundancy. Obviously, it was not your power failure that caused this issue, however as being a major player your NOC should have demanded that Equinox do mandatory failovers (of all failure points) at least once a month. I have had major issues with Zoho Support through out my company’s relationship with you and it seems that your company is really not ready for prime time just yet. When everything is working, you have a wondeful service, however its a hard sell to keep putting our workflow on your platform.

  72. kurt

    It is quite disappointing that you have not engineered in redundancy.With millions of paying customers depending on you, you would think that you would have a more fault-tolerant system. Its not hard to design and needs to be in place. Otherwise, Zoho is not really taking proper care of its responsibility as an application service hosting provider.

  73. Kevin

    It is what it is, so I guess we have to suck it up. I find it interesting that in this day and age Zoho does not have a back-up plan for such occurrences. Hmmm…

  74. Kent Owings

    I just got our company up and running on Zoho. Our company can not be subjective to these types of failures. Does any one know of a comparable CRM system I should consider switching to.

  75. Raju Vegesna

    @CC
    Apologize for this. In the process of restoring the services, we noticed database inconsistencies. That pushed us back to the table to recheck all databases.This is certainly not one of our best days. We are going to reevaluate things at our end and make some progress.

  76. Mark Stone

    No apologies necessary; this wasn’t your doing. We went through the same thing a few years ago and moved to a different data center provider. More info in our blog: http://www.reliablenetworks.co…Better to make sure the databases are truly consistent than open up the services prematurely. We’ll wait, no problem.Hang in there!
    Mark

  77. CC

    This reflects very poorly on you guys.You have to engineer in advance for redundancy… it’s incredible to me that power issues in a single data center could bring down *all* of your services for a whole business day.And let me also add, very disappointing that your updates throughout the course of the day have been so misleading. We’ve been given soft deadlines NUMEROUS times on when services would be coming back up… and you’ve blown them all. Do you realize some of us have to plan our day around your promises? Can we get some accurate projections??

  78. kazahcargo

    Bad news.