M

Hardware Health

Microsoft Corporation

Multiple Locations, United States, United States Full-time June 06, 2026
Apply Now

Opportunity Description

**Overview**

Microsoft AI operates one of the world’s most advanced AI training infrastructures, featuring multi-gigawatt clusters spanning tens of thousands of high-performance GPUs, ultra-low-latency NVLink/NVSwitch networks, and innovative liquid-cooling systems. Our team is seeking a Member of Technical Staff, Hardware Health, to ensure these systems deliver sustained reliability, performance, and availability across exascale-class deployments.

We work closely with research, hardware, datacenter, and platform engineering teams to develop predictive health models, failure detection frameworks, and autonomous remediation systems that keep our AI clusters operating at frontier scale.

Our newly formed organization, Microsoft AI, is dedicated to advancing Copilot and other consumer AI products and research. The team is responsible for Copilot, Bing, Edge, and generative AI research. Join us and help shape the future of personal computing.

Microsoft’s ...
Full-time other-general

Ready to Apply?

Submit your application for Hardware Health at Microsoft Corporation

Apply for this Position