We are the creators and maintainers of SpiceDB and the authorization infrastructure that companies around the world depend on to keep their engineering teams focused on what matters most - their own product.
We are a Series A company, fixing broken access control with products that eliminate complex permission management while delivering enterprise-scale performance and consistent access control.
AuthZed is a fully remote company with employees across the US, Canada, and Europe. We’re a hardworking and close-knit group with a software-driven culture (yep, even our GTM team understands and loves this technology)! We bring integrity to all our interactions, fostering confidence in decision making - trusting and respecting each voice on our team, every day.
Agency: Everyone should have the capability, freedom, and confidence to bring about changes to our business and product. Organizational processes exist to clearly define our goals, but not restrict how progress is made.
Collaboration: Success is defined in various dimensions and no single person can be an expert in all of them. Without valuing the opinions of others, finding compromises, and sharing mutual trust and respect, you cannot arrive at the best possible solution.
Open-mindedness: Without asking questions, testing assumptions, and questioning our pre-existing biases we risk operating within an echo-chamber. We celebrate the representation of diverse perspectives and backgrounds as a catalyst for creating an inclusive work environment that everyone can appreciate.
At AuthZed, we’re revolutionizing how modern applications handle access control, and reliability is at the heart of what we do. As an SRE Manager, you’ll lead the team responsible for ensuring the reliability, scalability, and performance of AuthZed’s infrastructure as we grow our global customer base.
This is a hands-on leadership role: you’ll manage and grow a team of SREs while remaining deeply engaged with production systems, incident response, and platform architecture. You’ll use a blend of Site Reliability Engineering and Platform Engineering to reduce operational toil, improve safety, and enable product teams to ship reliably at scale.
You’ll work with cutting-edge technologies, design resilient systems, and build automation and paved paths so customers can rely on AuthZed for their most critical workloads.
Lead a global team of Site Reliability Engineers delivering infrastructure automation, observability, and operational scalability across multi-cloud and multi-region kubernetes based architectures
Recruit, hire, onboard and develop engineers while elevating the overall strength of the team
Act as a player coach by contributing to critical projects while mentoring engineers and supporting their professional growth
Participate in on-call rotations at a sustainable level to stay grounded in real operational issues
Guide project planning by defining milestones, identifying dependencies, and working toward timely and meaningful delivery
Identify toil and lead initiatives to eliminate it through engineering solutions
Drive automation and platform engineering: safer deploys, progressive delivery, guardrails, and paved paths that reduce toil
Collaborate with product and engineering to ship features like self-service workflows and infra-as-code expectations with reliability baked in
Serve as a senior escalation point for complex incident triage and root cause analysis
10+ years of experience in infrastructure, SRE, or platform engineering roles
5+ years in senior technical leadership roles (SRE, Platform, or Infrastructure), including at least 2 years of people management experience
Experience managing distributed teams across US, Canada, EU, and global time zones
Experience leading or mentoring SRE/Infrastructure/Platform teams in a production SaaS environment. Strong leadership skills with the ability to mentor and coach senior-level engineers
Strong grasp of SRE fundamentals: SLOs/SLIs, error budgets, incident management, capacity planning, and operational excellence
Extensive experience with AWS, GCP and Azure managed services
Strong programming skills and experience writing production-quality automation or tooling (e.g., Go, Python, Bash)
Hands-on experience with Kubernetes, Kubernetes Operators/Controllers, containerized workloads, and Infrastructure as Code (Terraform, Pulumi)
Experience with monitoring and observability systems (e.g., Prometheus, Grafana, logging/tracing pipelines)
Excellent communication: can translate reliability tradeoffs to product/exec stakeholders and write crisp incident/postmortem artifacts
Proven ability to translate operational pain points into engineering deliverables
Experience working with or integrating AI-powered systems or tooling
Experience operating multi-tenant or high-isolation customer environments
Familiarity with distributed databases and performance tuning at scale
Experience building internal developer platforms or paved paths
Opportunity to work with cutting-edge technology in a rapidly growing sector
A supported environment where your ideas lead to real impact
Competitive salary based on experience
Stock options at an early-stage startup
Comprehensive benefits including healthcare (US-based) and other insurance
A full remote and flexible schedule to accommodate different timezones
Twice-yearly travel for team offsites focused on team bonding, collaboration, and having fun!
Loading similar jobs...
Discover fully remote job opportunities in the United States at USA Remote Jobs. Apply for roles like Software Developer, Customer Service Specialist, Project Manager, and more!