Senior SRE & Infra Engineer (GPU Cluster Platform Reliability & Infrastructure Engineer)

Macpower Digital Assets Edge Private Limited

San Francisco, California, United States Full-time May 15, 2026

Opportunity Description

This hybrid role spans across platform reliability and infrastructure engineering. You'll be instrumental in ensuring high availability, fault tolerance, and performance across internal research and external customers' GPU cluster environments. Responsibilities include automating GPU cluster onboarding, enhancing monitoring, logging, and security systems, and developing new backend features.

Required Skills and Certifications:

Proven experience with monitoring tools (e.g., Prometheus, Grafana) and incident management practice.

Strong skills in infrastructure automation with Ansible, Terraform, or similar.

Deep understanding of logging frameworks, alerting systems, and proactive monitoring solutions.

Proficiency in Python for developing automation scripts, REST APIs, and backend support tools.
Full-time architecture-and-engineering

Ready to Apply?

Submit your application for Senior SRE & Infra Engineer (GPU Cluster Platform Reliability & Infrastructure Engineer) at Macpower Digital Assets Edge Private Limited

Apply for this Position

Location San Francisco, California

Country United States

Type Full-time

Category architecture-and-engineering

Posted May 15, 2026

Deadline June 24, 2026

Senior SRE & Infra Engineer (GPU Cluster Platform Reliability & Infrastructure Engineer)

Opportunity Description

Ready to Apply?

Opportunity Details

About Macpower Digital Assets Edge Private Limited

Macpower Digital Assets Edge Private Limited

Share This Opportunity

Senior SRE &amp; Infra Engineer (GPU Cluster Platform Reliability &amp; Infrastructure Engineer)

Opportunity Description

Ready to Apply?

Opportunity Details

About Macpower Digital Assets Edge Private Limited

Macpower Digital Assets Edge Private Limited

Share This Opportunity

Senior SRE & Infra Engineer (GPU Cluster Platform Reliability & Infrastructure Engineer)