Opportunity Description
Key Responsibilities
Site Reliability & Operations
- Manage and improve the reliability, availability, and operational excellence of the SHIP-HATS platform
- Define, monitor, and maintain Service Level Objectives (SLOs) and Service Level Indicators (SLIs)
- Lead incident management, troubleshooting, root cause analysis, and post-mortem reviews
- Drive continuous improvements to reduce operational toil and prevent recurring incidents
- Perform capacity planning, performance tuning, and system optimisation
Observability & Monitoring
- Design and implement observability solutions across logging, metrics, and distributed tracing
- Build dashboards, alerts, and monitoring strategies to provide deep visibility into platform health
- Manage and maintain monitoring stacks such as Prometheus, Grafana, ELK, or equivalent tools
Infrastructure & Automation
- Dev...
Ready to Apply?
Submit your application for G64 - Full Stack Engineer at FPT Asia Pacific
Apply for this Position