Cloud & InfrastructureFull-timeHybrid

Senior Site Reliability Engineer (SRE)

Lead site reliability engineering efforts for large-scale distributed systems, driving 99.99% availability targets through advanced observability, automation, and resilience engineering.

Seattle, USA

Full-time

Cloud & Infrastructure

Responsibilities

Lead SRE strategy and practices across multiple product teams ensuring consistent reliability standards
Architect and maintain enterprise-grade observability platforms using OpenTelemetry, Prometheus, and Grafana
Drive chaos engineering programs to proactively identify failure modes and improve system resilience
Mentor SRE team members and embed reliability practices into the software development lifecycle
Lead high-severity incident response and establish processes for continuous improvement from postmortems
Design capacity planning models and automated scaling strategies for cost-efficient high availability

Requirements

6-9 years of SRE or production engineering experience with large-scale distributed systems
Deep expertise in observability including metrics, logs, traces, and profiling at scale
Advanced Kubernetes operations experience including multi-cluster management and custom controllers
Strong software engineering skills in Go, Python, or Rust for building production-grade SRE tooling
Proven experience managing SLO frameworks and driving reliability improvements across organizations
Experience with cloud infrastructure (AWS or GCP) at scale with multi-region deployments

Nice to Have

Experience leading SRE teams or guilds in a large engineering organization
Background in performance engineering and profiling distributed systems

Skills

SREKubernetesGoOpenTelemetryPrometheusAWSChaos EngineeringDistributed Systems

Apply for this position

Fill in your details below to submit your application.

Related Positions

Cloud & InfrastructureHybrid

Senior Site Reliability Engineer (SRE)

Lead site reliability engineering efforts for large-scale distributed systems, driving 99.99% availability targets through advanced observability, automation, and resilience engineering.

Seattle, USA

Full-time

Cloud & Infrastructure

Responsibilities

Lead SRE strategy and practices across multiple product teams ensuring consistent reliability standards
Architect and maintain enterprise-grade observability platforms using OpenTelemetry, Prometheus, and Grafana
Drive chaos engineering programs to proactively identify failure modes and improve system resilience
Mentor SRE team members and embed reliability practices into the software development lifecycle
Lead high-severity incident response and establish processes for continuous improvement from postmortems
Design capacity planning models and automated scaling strategies for cost-efficient high availability

Requirements

6-9 years of SRE or production engineering experience with large-scale distributed systems
Deep expertise in observability including metrics, logs, traces, and profiling at scale
Advanced Kubernetes operations experience including multi-cluster management and custom controllers
Strong software engineering skills in Go, Python, or Rust for building production-grade SRE tooling
Proven experience managing SLO frameworks and driving reliability improvements across organizations
Experience with cloud infrastructure (AWS or GCP) at scale with multi-region deployments

Nice to Have

Experience leading SRE teams or guilds in a large engineering organization
Background in performance engineering and profiling distributed systems

Skills

SREKubernetesGoOpenTelemetryPrometheusAWSChaos EngineeringDistributed Systems

Senior Site Reliability Engineer (SRE)

Responsibilities

Requirements

Nice to Have

Skills

Apply for this position

Related Positions

Edge Computing Engineer

DevOps Engineer (AWS & Kubernetes)

Staff Software Engineer

Snowflake Data Engineer

Senior Site Reliability Engineer (SRE)

Responsibilities

Requirements

Nice to Have

Skills

Apply for this position

Related Positions

Edge Computing Engineer

DevOps Engineer (AWS & Kubernetes)

Staff Software Engineer

Snowflake Data Engineer