Senior Site Reliability Engineer (Remote - US)

  1. Home
  2. Remote jobs
  3. terraform
  • Company Jobgether
  • Employment Full-time
  • Location 🇺🇸 United States nationwide
  • Submitted Posted 4 days ago - Updated 2 hours ago

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Sr. Reliability Engineer in United States.

This role offers the opportunity to lead the development of highly reliable, scalable SaaS systems while integrating cutting-edge AI into operational workflows. You will design and implement autonomous reliability operations, predictive monitoring, and self-healing infrastructure to ensure high availability across distributed cloud environments. Collaborating closely with engineering teams, you will embed AI-driven reliability practices into CI/CD pipelines, optimize incident response, and mentor peers on next-generation SRE practices. The ideal candidate has deep expertise in large-scale SaaS systems, cloud infrastructure, observability, and AI-assisted automation. This is a highly visible role where your work directly impacts system resilience, operational efficiency, and customer satisfaction, in a forward-thinking, AI-first environment.

Accountabilities:

·         Design, implement, and scale reliable SaaS systems with a focus on autonomous and AI-driven operations.

·         Build AI-enhanced observability, anomaly detection, and predictive monitoring using tools such as Prometheus, Grafana, Loki, Tempo, and OpenTelemetry.

·         Develop automated workflows using Agentic AI frameworks, AI Flow tools, or custom AI agents to remediate production incidents.

·         Collaborate with development teams to embed AI-assisted reliability feedback loops into CI/CD pipelines.

·         Define, track, and optimize SLOs, SLIs, and SLAs to improve system reliability and operational efficiency.

·         Mentor engineers on AI-driven SRE practices and contribute to reliability playbooks and process improvements.

·         Troubleshoot and optimize production environments leveraging LLM-based diagnostics and pattern recognition across logs, traces, and metrics.

Requirements

·         5+ years of experience in large-scale SaaS or distributed systems environments.

·         Bachelor’s degree in Computer Science, Engineering, or equivalent technical experience.

·         Hands-on experience with AI-driven operations, including Agentic AI, AI Flow tools, and AIOps practices.

·         Proficiency in software engineering with Go and Python, automation frameworks, and API integrations.

·         Expertise in observability and monitoring tools, cloud platforms (AWS, GCP, Azure), and Infrastructure as Code (Terraform or Pulumi).

·         Strong knowledge of containerization and orchestration (Kubernetes, Docker), Linux system administration, and cloud networking principles.

·         Excellent communication skills for documenting, reporting, and collaborating with cross-functional teams.

·         Ability to participate in a rotating on-call schedule for production systems.

·         Preferred: Experience integrating AI copilots into production systems, predictive SLO dashboards, or autonomous agent orchestration frameworks.

Benefits

·         Competitive compensation package, including performance-based bonuses.

·         Comprehensive health, life, and disability insurance plans.

·         Paid time off and parental leave.

·         Remote-friendly work environment with occasional on-site collaboration.

·         Opportunities for career growth, mentorship, and contribution to AI-first engineering practices.

·         Inclusive and diverse workplace culture with employee resource groups.


Jobgether is a Talent Matching Platform that partners with companies worldwide to efficiently connect top talent with the right opportunities through AI-driven job matching.

When you apply, your profile undergoes an AI-powered screening process designed to identify top talent efficiently and fairly:
🔍 Our AI evaluates your CV and LinkedIn profile thoroughly, analyzing your skills, experience, and achievements.
📊 It compares your profile to the job’s core requirements and past success factors to determine your match score.
🎯 We automatically shortlist the 3 candidates with the highest match to the role.
🧠 When necessary, our human team may perform an additional manual review to ensure no strong profile is missed.

The process is transparent, skills-based, and free of bias — focusing solely on your fit for the role. Once the shortlist is complete, it is shared with the hiring company, which handles final decisions and next steps.

Thank you for your interest!

 

#LI-CL1

Loading similar jobs...

USA Remote Jobs

Discover fully remote job opportunities in the United States at USA Remote Jobs. Apply for roles like Software Developer, Customer Service Specialist, Project Manager, and more!

© 2025 Created by USA Remote Jobs. All rights reserved.