Possible Systems Outage
Incident Report for Triarom Data Services Status Page
Postmortem

This outage was caused by an overheating event on two servers.

Around 2pm on Sunday, the 14th of August, two of our servers (TE-Server and HV1) powered off automatically to protect themselves from overheating.

We are still working on implementing additional methods of cooling for our systems, the redundancy systems were able to keep all customer-facing services online at reduced capacity.

Since starting the servers up again at 21:00, all system load has been redistributed across the two hosts and systems have returned to normal operating status.

Posted Aug 15, 2022 - 01:12 BST

Resolved
This incident is now closed, normal service has resumed.
Posted Aug 14, 2022 - 23:17 BST
Monitoring
All shutdown servers have returned to normal operation - we are running the DRS load balancing now to settle the system load. We will continue to monitor this issue closely and continue with our heat reducing works.
Posted Aug 14, 2022 - 23:10 BST
Update
We have identified faults with a number of our servers, likely related to the high temperatures.
Posted Aug 14, 2022 - 22:43 BST
Identified
We believe this fault to be a thermal shutdown event of TE-Server, this should not have affected any customer facing services. We are remotely restarting the host.
Posted Aug 14, 2022 - 22:31 BST
Investigating
We are currently investigating a partial outage to our services, this does not include Triarom Desktop Services or Triarom Exchange Email Services.

We believe this may be due to the storm activity in the area.
Posted Aug 14, 2022 - 22:06 BST