Senior Site Reliability Engineer (Resilience) - Platform Resilience

  1. Home
  2. Remote jobs
  3. Automation
  • Company Jobgether
  • Employment Full-time
  • Location 🇺🇸 United States nationwide
  • Submitted Posted 1 week ago - Updated 13 hours ago

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Site Reliability Engineer (Resilience) – Platform Resilience in the United States.

This is a high-impact engineering role focused on building and maintaining highly reliable, scalable, and resilient cloud infrastructure that powers mission-critical SaaS and platform services. You will work within a global Platform Engineering organization, contributing to the design, automation, and evolution of multi-cloud systems that support large-scale distributed environments. In this role, you will take an engineering-first approach to reliability, driving automation, observability, and incident prevention strategies. You will collaborate closely with software engineers and infrastructure teams to ensure seamless deployment and operation of services across cloud environments. Operating in a follow-the-sun support model, you will help respond to and prevent major incidents while continuously improving system resilience. This position combines hands-on engineering, cloud infrastructure expertise, and cross-functional collaboration in a fast-paced, globally distributed environment.


Accountabilities:
  • Design, build, and maintain reliable and scalable multi-cloud platform infrastructure supporting large-scale SaaS services
  • Lead technical initiatives focused on automation, reliability engineering, and system resilience improvements
  • Develop tools, software, and automation frameworks to enhance infrastructure efficiency and operational stability
  • Respond to and prevent recurring incidents through effective root cause analysis and problem management
  • Participate in a global on-call rotation using a follow-the-sun model to ensure system reliability
  • Collaborate with engineering teams to identify and implement solutions for complex infrastructure challenges
  • Drive observability and monitoring improvements to enhance detection, diagnosis, and resolution of issues
  • Contribute to infrastructure-as-code practices and cloud automation strategies
  • Promote operational excellence through documentation, process improvement, and best practices adoption
  • Mentor and support peers while fostering a collaborative and inclusive engineering culture
  • Continuously evaluate system performance and scalability to meet growing global demand

Requirements:

  • Experience as a Site Reliability Engineer, Platform Engineer, or Software Engineer in large-scale distributed systems
  • Strong background in software engineering with the ability to design and implement automation and infrastructure solutions
  • Hands-on experience with public cloud platforms and managed Kubernetes environments
  • Proficiency in at least one programming language (e.g., Go, Python, or similar) for infrastructure or backend development
  • Experience with Infrastructure-as-Code tools such as Terraform or Crossplane is highly desirable
  • Strong understanding of containerized environments (e.g., Docker) and cloud-native architectures
  • Experience operating or supporting SaaS platforms in production environments
  • Strong knowledge of Linux systems administration in distributed environments
  • Familiarity with observability and monitoring tools (e.g., Prometheus, Grafana, Elastic Stack, or similar)
  • Experience with incident response, alerting systems, and reliability engineering best practices
  • Strong communication skills and ability to work effectively in globally distributed teams
  • Passion for mentoring, collaboration, and continuous improvement
  • Bonus: experience building or scaling Kubernetes infrastructure across multiple cloud providers

Benefits:

  • Competitive base salary ranging from $154,800 to $195,600 USD
  • Equity participation through stock programs
  • Company-matched 401(k) plan (up to 6%)
  • Comprehensive health coverage for employees and families (varies by location)
  • Generous paid time off and flexible work arrangements
  • Paid parental leave (minimum of 16 weeks)
  • Remote-friendly global work environment
  • Volunteer time off and charitable donation matching programs
  • Strong focus on employee well-being and work-life balance
  • Inclusive and diverse workplace culture supporting all backgrounds and identities


How Jobgether works:

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

 Why Apply Through Jobgether? 

 

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

 

 

#LI-CL1

Loading similar jobs...

USA Remote Jobs

Discover fully remote job opportunities in the United States at USA Remote Jobs. Apply for roles like Software Developer, Customer Service Specialist, Project Manager, and more!

© 2026 Created by USA Remote Jobs. All rights reserved.