Cloud & InfrastructureFull-timeOn-site

Site Reliability Engineer (SRE)

Ensure production system reliability by implementing SLO-driven practices, building observability platforms, and automating incident response for critical enterprise applications.

Hyderabad, India

Full-time

Cloud & Infrastructure

Responsibilities

Define and maintain SLIs, SLOs, and error budgets for production services with clear escalation policies
Build and operate observability platforms using Prometheus, Grafana, Loki, and distributed tracing tools
Participate in on-call rotations and lead incident response, conducting thorough blameless postmortems
Automate toil reduction through self-healing systems, runbook automation, and chaos engineering practices
Collaborate with development teams to improve service reliability through design reviews and load testing
Develop and maintain internal SRE tooling for deployment safety, capacity planning, and performance analysis

Requirements

3-5 years of experience in SRE, DevOps, or production engineering roles
Strong understanding of SRE principles including SLOs, error budgets, and toil elimination
Proficiency with monitoring and observability tools (Prometheus, Grafana, ELK/Loki, Jaeger)
Experience with Linux systems administration, networking, and performance troubleshooting
Strong programming skills in Python, Go, or similar languages for building SRE tooling
Experience with incident management processes and blameless postmortem culture

Nice to Have

Experience with chaos engineering tools like Litmus Chaos or Gremlin
Knowledge of capacity planning and traffic forecasting methodologies
Familiarity with AIOps and ML-driven anomaly detection

Skills

SREPrometheusGrafanaKubernetesPythonLinuxIncident ManagementObservability

Apply for this position

Fill in your details below to submit your application.

Related Positions

Cloud & InfrastructureHybrid

Site Reliability Engineer (SRE)

Ensure production system reliability by implementing SLO-driven practices, building observability platforms, and automating incident response for critical enterprise applications.

Hyderabad, India

Full-time

Cloud & Infrastructure

Responsibilities

Define and maintain SLIs, SLOs, and error budgets for production services with clear escalation policies
Build and operate observability platforms using Prometheus, Grafana, Loki, and distributed tracing tools
Participate in on-call rotations and lead incident response, conducting thorough blameless postmortems
Automate toil reduction through self-healing systems, runbook automation, and chaos engineering practices
Collaborate with development teams to improve service reliability through design reviews and load testing
Develop and maintain internal SRE tooling for deployment safety, capacity planning, and performance analysis

Requirements

3-5 years of experience in SRE, DevOps, or production engineering roles
Strong understanding of SRE principles including SLOs, error budgets, and toil elimination
Proficiency with monitoring and observability tools (Prometheus, Grafana, ELK/Loki, Jaeger)
Experience with Linux systems administration, networking, and performance troubleshooting
Strong programming skills in Python, Go, or similar languages for building SRE tooling
Experience with incident management processes and blameless postmortem culture

Nice to Have

Experience with chaos engineering tools like Litmus Chaos or Gremlin
Knowledge of capacity planning and traffic forecasting methodologies
Familiarity with AIOps and ML-driven anomaly detection

Skills

SREPrometheusGrafanaKubernetesPythonLinuxIncident ManagementObservability

Site Reliability Engineer (SRE)

Responsibilities

Requirements

Nice to Have

Skills

Apply for this position

Related Positions

Senior DevOps Engineer

Senior Site Reliability Engineer (SRE)

Senior Database Administrator (PostgreSQL)

Software Architect

Site Reliability Engineer (SRE)

Responsibilities

Requirements

Nice to Have

Skills

Apply for this position

Related Positions

Senior DevOps Engineer

Senior Site Reliability Engineer (SRE)

Senior Database Administrator (PostgreSQL)

Software Architect