Triarom Data Services Status Page Status

Possible Systems Outage

Incident Report for Triarom Data Services Status Page

Postmortem

This outage was caused by an overheating event on two servers.

Around 2pm on Sunday, the 14th of August, two of our servers (TE-Server and HV1) powered off automatically to protect themselves from overheating.

We are still working on implementing additional methods of cooling for our systems, the redundancy systems were able to keep all customer-facing services online at reduced capacity.

Since starting the servers up again at 21:00, all system load has been redistributed across the two hosts and systems have returned to normal operating status.

Posted Aug 15, 2022 - 01:12 BST

Resolved

This incident is now closed, normal service has resumed.

Posted Aug 14, 2022 - 23:17 BST

Monitoring

All shutdown servers have returned to normal operation - we are running the DRS load balancing now to settle the system load. We will continue to monitor this issue closely and continue with our heat reducing works.

Posted Aug 14, 2022 - 23:10 BST

Update

We have identified faults with a number of our servers, likely related to the high temperatures.

Posted Aug 14, 2022 - 22:43 BST

Identified

We believe this fault to be a thermal shutdown event of TE-Server, this should not have affected any customer facing services. We are remotely restarting the host.

Posted Aug 14, 2022 - 22:31 BST

Investigating

We are currently investigating a partial outage to our services, this does not include Triarom Desktop Services or Triarom Exchange Email Services.

We believe this may be due to the storm activity in the area.

Posted Aug 14, 2022 - 22:06 BST