Customer currently uses ELK stack, and the goal is to standardize and modernize logs, metrics, and traces using OpenTelemetry, while improving visibility, reliability, and operational intelligence.Observability Architecture & Modernization :- Assess the existing ELK-based observability setup and define a modern observability architecture- Design and implement standardized logging, metrics, and distributed tracing using OpenTelemetry- Define observability best practices for cloud-native and Azure-based applications- Ensure consistent telemetry collection across microservices, APIs, and infrastructureLogging, Metrics & Tracing :- Instrument applications using OpenTelemetry SDKs (SpringBoot, .NET, Python, Javascript as applicable)- Support Kubernetes and container-based workloads (if applicable)- Configure and optimize log pipelines, trace exporters, and metric collectors- Integrate OpenTelemetry with ELK / OpenSearch / Azure Monitor / other backends- Define SLIs, SLOs, and alerting strategies- Knowledge in integrating the GitHub and Jira metrics as DORA metrics to observability.Operational Excellence :- Improve observability performance, cost efficiency, and data retention strategies- Create dashboards, runbooks, and documentationAI-based Anomaly Detection & Triage (Good to Have) :- Design or integrate AI/ML-based anomaly detection for logs, metrics, and traces- Worked on AIOps capabilities for automated incident triage and insightsRequired Technical Skills :Core Observability :- Strong hands-on experience with ELK Stack (Elasticsearch, Logstash, Kibana)- Deep understanding of logs, metrics, traces, and distributed systems- Practical experience with OpenTelemetry (Collectors, SDKs, exporters, receivers)Cloud & Platforms :- Strong experience with Microsoft Azure to integrate with Observability platform.- Experience with Kubernetes / AKS to integrate with Observability platform.- Knowledge of Azure monitoring tools (Azure Monitor, Log Analytics, Application Insights)- Experience with Kubernetes / AKS is a strong plus.Soft Skills :- Strong architecture and problem-solving skills- Clear communication and documentation skills- Hands-on mindset with an architect-level viewGood to Have / Preferred Skills :- Experience with AIOps / anomaly detection platforms- Exposure to tools like Prometheus, Grafana, Jaeger, OpenSearch, Datadog, Dynatrace, New Relic (any)- Experience with incident management, SRE practices, and reliability engineering