Estudios de caso

Data Center Over-Temperature Event

J.S. Held adquiere Clark Seif Clark y fortalece sus capacidades en la Costa Oeste para reclamos ambientales, disputas y respuesta ante catástrofes

LEER MÁS cerrar Creado con Sketch.
Inicio·Data Center Over-Temperature Event

La situación

A major data center experienced an HVAC failure, resulting in temperatures exceeding 120°F. The over-temperature event could have impacted more than $40 million in servers, storage arrays, network equipment, and other IT equipment. As a result of the HVAC failure, all systems either automatically powered off when internal temperatures reached relevant limits or were eventually powered off manually by staff.

Some original equipment manufacturers (OEMs) condemned the IT systems, recommending replacements and voiding warranties due to the event. As a result, the facility's Insurer retained experts from J.S. Held's Equipment Consulting Practice to assess the damage and provide recommendations based on J.S. Held's inspections, analysis, and discussions with the Insured and OEMs.

Nuestro asesoramiento

J.S. Held data center equipment experts conducted thorough assessments of the impacted equipment, which included analysis of error log data. With a few exceptions, all of the equipment demonstrated no evidence of visual damage, and error log data demonstrated that the systems either automatically shut down and entered safe mode or the internal temperatures did not exceed out-of-specification temperature levels.

However, the analysis did identify several systems that were subjected to excessive temperatures and suffered internal failures. The error log data showed that temperatures rose quickly, especially on the GPUs, indicating thermal stress or inadequate cooling. A simplified summary of error log data for one damaged system is shown below:

  • CPU Activity: The processor speed jumped significantly (from ~t625M Hz to ~34 71MHz), indicating a shift from an idle state to an active state.
  • CPU core temperatures rose from 117°F to as high as 145°F, showing increased thermal output as workloads intensified.
  • CPU load percentage spiked from 13.7% to 84.8%, then stabilized around 50%, suggesting a burst of activity followed by sustained moderate usage.
  • GPU Temperature: The GPU temperature climbed steadily from 117°F to 183°F, reflecting increased graphics processing demand or poor cooling efficiency.

Based on our experts' review and analysis, multiple systems were determined to be viable for continued usage, with other systems needing component or full replacement based on error log data. This analysis not only saved the facility the cost of replacing unaffected equipment but also enabled the Insured to return to operation expeditiously.

CONTACTOS PRINCIPALES

Scott Armstrong
Vicepresidente ejecutivo
Equipment Consulting Practice Lead
+1 949 390 7483
[email protected]

 

Brooks Armstrong
Vicepresidente sénior
Equipment Consulting Regional Lead
+1 972 980 5075
[email protected]

Áreas de práctica relacionadas

> Information Technology
Nuestros expertos en tecnología informática evalúan sistemas que van desde ambientes de escritorio hasta centros de datos expansivos y multimillonarios. Ofrecemos análisis objetivos e independientes basados en años de experiencia en la industria, experiencia técnica y conocimiento del mercado.

> Consultoría de equipos
Los expertos de J.S. Held brindan apoyo en asuntos que van desde revisiones diarias de escritorio hasta reclamos tecnológicos complejos por varios millones de dólares. Nuestro equipo aprovecha sus años de experiencia en el manejo de una variedad de equipos y sistemas especializados.

Nuestros expertos