Job Overview
Join Opaque as a Software Engineer - Infrastructure, where you'll harness your expertise in cloud infrastructure, automation, and modern DevOps practices to build, optimize, and secure our Confidential AI platform. You'll design reliable, high-availability systems using tools like Kubernetes, Terraform, and GitHub Actions while enabling seamless CI/CD workflows. From improving system performance and incident management to ensuring compliance (SOC, HIPAA) and deploying proactive cybersecurity measures, you'll play a critical role in balancing innovation, reliability, and customer trust.
Key Responsibilities
- Design and implement automated build, test, and other cloud environments using infrastructure-as-code
- Partner with development teams to improve services through rigorous testing and release procedures
- Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding
- Participate in system design consulting, platform management, and capacity planning
- Balance feature development speed and reliability with well-defined service-level objectives
- Manage and troubleshoot incidents and analyze root causes
- Identify and deploy cybersecurity measures for continuous vulnerability assessment and risk management.
- Support SOC, HIPPA, and other compliance, and assist with customer infosec questionnaires.
Qualifications
- 5+ years of experience in a software engineering or site reliability engineering role
- Excellent communication and problem-solving skills across languages
- Experience operating 24x7 high-availability, distributed software applications and performance tuning software applications and optimizing fleet utilization
- Experience building infrastructure and tooling from the ground up.
- Experience scripting operating system tasks in Bash, Python, etc
- Knowledge about hosting multi-tenant solutions with hybrid deployment models
- Strong knowledge of modern web standards and protocols (HTTP, TLS, OAuth2, CORS), network fundamentals (DNS, DHCP, TCP/IP, routing, load balancing, load shedding), and experience with monitoring frameworks (such as CloudWatch, Datadog, Grafana, Elastic or similar)
- Hands-on production experience working with:
- Kubernetes, terraform, Github Actions
- At least one major cloud provider (Azure, AWS, GCP)
- Managing critical production workloads