Sre Manager

10 - 20 Years
1 Openings
45.0 - 100.0 Lac/Yr
Online interview
Hyderabad

Key Skills

Site Reliability Engineer Technical Team Leader

Apply

Job Description

Job Title: Site Reliability Engineering (SRE) Manager

Location: Hyderabad

Employment Type: Full-Time

Work Model - 3 Days from office (Hybrid)

Summary:

The SRE Manager at company will lead the reliability engineering function, ensuring infrastructure resiliency and optimal operational performance. This hybrid role blends technical leadership with team mentorship and cross-functional coordination.

Experience Required:

10+ years total experience, with 3+ years in a leadership role in SRE or Cloud Operations.

Technical Knowledge and Skills:

Mandatory:

â€¢ Deep understanding of Kubernetes, GKE, Prometheus, Terraform

â€¢ Cloud: Advanced GCP administration

â€¢ CI/CD: Jenkins, Argo CD, GitHub Actions

â€¢ Incident Management: Full lifecycle, tools like OpsGenie

Nice to Have:

â€¢ Knowledge of service mesh and observability stacks

â€¢ Strong scripting skills (Python, Bash)

â€¢ Big Query /Dataflow exposure for telemetry

Scope:

â€¢ Build and lead a team of SREs

â€¢ Standardize practices for reliability, alerting, and response

â€¢ Engage with Engineering and Product leaders

Roles and Responsibilities:

â€¢ Establish and lead the implementation of organizational reliability strategies, aligning SLAs, SLOs, and Error Budgets with business goals and customer expectations.

â€¢ Develop and institutionalize incident response frameworks, including escalation policies, on-call scheduling, service ownership mapping, and RCA process governance.

â€¢ Lead technical reviews for infrastructure reliability design, high-availability architectures, and resiliency patterns across distributed cloud services. Champion observability and monitoring culture by standardizing tooling, alert definitions, dashboard templates, and telemetry data schemas across all product teams.

â€¢ Drive continuous improvement through operational maturity assessments, toil elimination initiatives, and SRE OKRs aligned with product objectives. Collaborate with cloud engineering and platform teams to introduce self-healing systems, capacity-aware autoscaling, and latency-optimized service mesh patterns.

â€¢ Act as the principal escalation point for reliability-related concerns and ensure incident retrospectives lead to measurable improvements in uptime and MTTR.

â€¢ Own runbook standardization, capacity planning, failure mode analysis, and production readiness reviews for new feature launches. Mentor and develop a high-performing SRE team, fostering a proactive ownership culture, encouraging cross-functional knowledge sharing, and establishing technical career pathways.

Collaborate with leadership, delivery, and customer stakeholders to define reliability goals, track performance, and demonstrate ROI on SRE investments

Experience
10 - 20 Years
No. of Openings
1
Education
Graduate
Role
Site Reliability Engineer
Industry Type
IT-Hardware & Networking
Gender
[ Male / Female ]
Job Country
India
Type of Job
Full Time
Work Location Type
Work from Office

Apply

Similar Jobs

Apply Now

Register to Get Relevant Jobs

Your Mobile Number

I agree to the Terms and Conditions