Opportunity Description
About the role
The position is responsible for contributing to the reliability, scalability, and performance of the company’s cloud-native infrastructure and production services.
Responsibilities System Monitoring & Observability Configure and maintain monitoring tools (e.g., Prometheus, Datadog) to track key system metrics (latency, traffic, errors, saturation). Create and refine dashboards and alerts to ensure rapid detection of anomalies and potential outages. Assist in the implementation of distributed tracing and structured logging to improve debugging and performance analysis. Incident Response & Management Participate in a 24/7 on‑call rotation as a secondary responder, escalating issues as needed to senior team members. Follow incident response playbooks to diagnose and mitigate production incidents, aiming to restore service within defined SLOs. Contribute to blameless post‑incident reviews by documenting timelines, root causes, and action items to prevent recurrence....Ready to Apply?
Submit your application for Junior site reliability engineer at Signa Opportunity
Apply for this Position