Another hit for the HealthCare.gov site came over the weekend as outages shut down the already struggling site.
Verizon Terremark, in charge of hosting the database hub for the site, experienced a "failure in a networking component" Sunday, and subsequent attempts to fix the problem through regular maintenance took down the system, according to a press release by the Connecticut exchange, which was hit by the outages. The outage affected other sites as well, according to the same release.
Verizon Terremark, the cloud infrastructure division of Verizon, did not respond to multiple CRN requests for comment for an update on or explanation of the situation.
The HealthCare.gov site is back up and running as of 7 a.m. EST Monday, but the question remains: What caused the outages? Even though they didn't have access to the back-end systems, solution providers told CRN what they thought the problem was based on Verizon's explanation and who should be blamed for falling short.
"There's certainly no details out yet, but it looks inexcusable," said Robin Purohit, CEO of Clustrix.
Network problems are "inevitable," Purohit said, but it is something that every e-commerce site already deals with, which means the HealthCare.gov site should have been prepared.
"It's bizarre to me that a network component can fail and take the entire system offline," Purohit said.
Verizon's comment that a network component issue caused the outage suggests that it was a network architecture problem, said Andrew Pryfogle, senior vice president and general manager of cloud services and complex bids at Intelisys. There should have been redundancies built into the system, he said, to prevent a single malfunction from taking down the entire system.
"It could be either an ingress or egress to their network that had an issue. But, ... why would a single network failed component cause this outage? Why was there not diversity built in if that's the case?" Pryfogle said. "It could have been a database failure, a storage failure, or a router or a switch component that failed anywhere on the network, but if any single failure caused that outage, that brings you right back to the big question of why wasn't that designed around on the front end."
The apparent lack of redundancy is best practice, Purohit said, and the fact that the outage happened with Verizon suggests someone along the way told the company not to put the redundancies in place.
"It's hard to tell whether Verizon is at fault here or whether they were asked to do the wrong thing," Purohit said.
Purohit suggested the culprit was CMS, the government group in charge of creating the site. He said that he wouldn't chalk the issue up to budgetary problems, as the department has spent hundreds of millions on the site, but rather to the passiveness and lack of accountability in such large contracted projects.
NEXT: More Than Just Bad Architecture?