Maven AGI is an enterprise AI platform founded in July 2023 by executives from
HubSpot, Google, and Stripe. We build conversational AI agents for autonomous
customer support at scale. Our platform unifies fragmented systems, integrates
knowledge sources, and enables intelligent actions without costly infrastructure
changes.
Our team includes talent from Google, Meta, Amazon, Microsoft, and Stripe, with
advisors from OpenAI, Google, HubSpot, and Stripe.
We're looking for a Senior DevOps Engineer to own and evolve the infrastructure
powering Maven AGI's AI platform. You'll design, build, and operate production
systems across cloud providers, Kubernetes clusters, and CI/CD pipelines --
ensuring our platform scales reliably as we onboard enterprise customers. This
is a high-leverage role where your work directly impacts platform availability,
developer velocity, and customer trust.
Design, implement, and maintain cloud infrastructure (Azure, AWS) using infrastructure-as-code (Pulumi, Bicep, Terraform)
Own Kubernetes cluster operations: deployments, scaling, monitoring, and incident response
Build and optimize CI/CD pipelines for a large-scale monorepo
Implement observability across services (metrics, logging, tracing, alerting)
Drive reliability practices: SLOs, capacity planning, disaster recovery, and runbook development
Collaborate with engineering teams to improve developer experience and deployment velocity
Manage secrets, access controls, and infrastructure security posture
Evaluate and adopt new tooling to reduce operational toil
3-7 years of professional DevOps/SRE/Infrastructure experience
Deep expertise with Kubernetes in production (AKS, EKS, or GKE)
Strong infrastructure-as-code skills (Pulumi, Terraform, or Bicep)
Experience operating CI/CD systems (GitHub Actions, ArgoCD, or Jenkins)
Proficiency in at least one scripting/programming language (Python, Go, TypeScript, or Bash)
Solid understanding of IaaS providers, networking, DNS, load balancing, and TLS
Experience with monitoring and observability stacks (Datadog, Prometheus, Grafana, or similar)
Strong communication and cross-team collaboration skills
Organized, great attention to detail, comfortable operating in a ticketing environment
Thrives in fast-paced startup environments
Experience with GPU infrastructure and ML/LLM serving workloads (vLLM, TEI)
Familiarity with Temporal or other workflow orchestration systems
Security and compliance background (SOC 2, HIPAA, GDPR)
Experience with multi-cloud or hybrid (cloud + on-prem) deployments
Cost optimization experience at scale
Meaningful equity stakes
Performance bonus
Comprehensive benefits package
High impact in cutting-edge AI field
Diverse and welcoming work environment
Do right for customers -- customer trust is earned through reliability
Data-driven -- we measure before we change, and we alert before customers notice
Entrepreneurial -- own the problem end-to-end
Strive to be better, together -- continuous improvement through collaboration
Loading similar jobs...
Discover fully remote job opportunities in the United States at USA Remote Jobs. Apply for roles like Software Developer, Customer Service Specialist, Project Manager, and more!