Sre Manager

  • icon job experience 10 - 20 Years
  • icon job opening 1 Openings
  • icon salary 45.0 - 100.0 Lac/Yr
  • icon job posting Posted today
  • Online interview Online interview
  • icon job location Hyderabad
Key Skills

Site Reliability Engineer Technical Team Leader

Job Description

job title: site reliability engineering (sre) manager

location: hyderabad

employment type: full-time

work model - 3 days from office (hybrid)

summary:

the sre manager at company will lead the reliability engineering function, ensuring infrastructure resiliency and optimal operational performance. this hybrid role blends technical leadership with team mentorship and cross-functional coordination.

experience required:

10+ years total experience, with 3+ years in a leadership role in sre or cloud operations.

technical knowledge and skills:

mandatory:

• deep understanding of kubernetes, gke, prometheus, terraform

• cloud: advanced gcp administration

• ci/cd: jenkins, argo cd, github actions

• incident management: full lifecycle, tools like opsgenie

nice to have:

• knowledge of service mesh and observability stacks

• strong scripting skills (python, bash)

• big query /dataflow exposure for telemetry

scope:

• build and lead a team of sres

• standardize practices for reliability, alerting, and response

• engage with engineering and product leaders

roles and responsibilities:

• establish and lead the implementation of organizational reliability strategies, aligning slas, slos, and error budgets with business goals and customer expectations.

• develop and institutionalize incident response frameworks, including escalation policies, on-call scheduling, service ownership mapping, and rca process governance.

• lead technical reviews for infrastructure reliability design, high-availability architectures, and resiliency patterns across distributed cloud services. champion observability and monitoring culture by standardizing tooling, alert definitions, dashboard templates, and telemetry data schemas across all product teams.

• drive continuous improvement through operational maturity assessments, toil elimination initiatives, and sre okrs aligned with product objectives. collaborate with cloud engineering and platform teams to introduce self-healing systems, capacity-aware autoscaling, and latency-optimized service mesh patterns.

• act as the principal escalation point for reliability-related concerns and ensure incident retrospectives lead to measurable improvements in uptime and mttr.

• own runbook standardization, capacity planning, failure mode analysis, and production readiness reviews for new feature launches. mentor and develop a high-performing sre team, fostering a proactive ownership culture, encouraging cross-functional knowledge sharing, and establishing technical career pathways.

collaborate with leadership, delivery, and customer stakeholders to define reliability goals, track performance, and demonstrate roi on sre investments
  • Experience

    10 - 20 Years

  • No. of Openings

    1

  • Education

    Any Bachelor Degree

  • Role

    Site Reliability Engineer

  • Industry Type

    IT-Hardware & Networking / IT-Software / Software Services

  • Gender

    [ Male / Female ]

  • Job Country

    India

  • Type of Job

    Full Time

  • Work Location Type

    Work from Office

Similar Jobs
Apply Now

Register to Get Relevant Jobs

Get Noticed By Top Recruiters

Become a Premium Job Seeker

  • Higher Boosting
  • Resume Highlighter
  • Verified Stamp
  • Resume Exposure

499/- for 3 months

Pay Now

We use cookies to improve your experience. By continuing to browse the site, you agree to our Privacy Policy Terms & Conditions [Seeker]

Got it