Join us. Let’s make a direct impact in healthcare.
Being an Iodine employee means becoming part of something bigger - using clinical AI technology to drive smarter healthcare processes and positively impact patient care.
Who we are:
Recognized as one of Austin’s best places to work, we are a collaborative and dedicated team with innovation built into our DNA. Iodine is an enterprise AI company that is championing a radical rethink of how to create value for healthcare professionals, leaders, and their organizations - by automating complex clinical tasks, generating insights and empowering intelligent care. Powered by one of the largest sets of clinical data and use cases available, our groundbreaking clinical machine-learning engine, Cognitive ML, constantly ingests the patient record to generate real-time, highly focused, predictive insights that clinicians and hospital administrators can leverage to dramatically augment the management of care delivery.
We are seeking a highly skilled Site Reliability Engineer (SRE) with AWS Cloud expertise to design, build, and optimize our cloud platform and infrastructure. This role demands deep hands-on experience with AWS cloud services across compute, storage, databases, networking, and security, combined with strong cost optimization strategies. You will implement the cloud roadmap and strategy, design scalable solutions, and ensure the reliability, security, and cost-efficiency of the platform and infrastructure. You will be responsible for the scalability of the platform and infrastructure, ensuring it can support business growth while maintaining high availability and performance. for driving reliability, security, cost optimization, and operational excellence across our platform. Additionally, you will participate in key architectural discussions with product engineering and security teams to ensure new and existing services follow best practices and meet operational excellence standards.
If you are an experienced SRE/Cloud Engineer with AWS expertise and a strong SRE mindset, passionate about high availability, security, automation, operational and cost efficiency, we would love to hear from you.
Implement the cloud roadmap and strategy to drive scalability, reliability, security, and cost efficiency.
Drive cloud adoption initiatives, ensuring alignment with business objectives.
Implement and support initiatives on cloud governance, architectural best practices, and modernization strategies.
Develop Infrastructure as Code (IaC) using Terraform, CloudFormation, or AWS CDK for fully automated provisioning and deployment.
Own and improve infrastructure CI/CD pipelines using Gitlab, Ansible (AWX), Argo CD, Helm
Implement self-healing, fault-tolerant architectures that can automatically recover from failures.
Optimize infrastructure monitoring and observability using Prometheus, Grafana, Loki, Tempo, Mimir, AWS CloudWatch, AWS Cloudtrail and New Relic
Participate in architecture discussions with product engineering teams for onboarding new services, ensuring they are scalable, cost-optimized, and aligned with best engineering practices.
Collaborate with software developers to optimize application performance and cloud-native designs.
Operational Duties & Business Support
Perform regular system and infrastructure maintenance including OS-level patching, AMI refreshes, and kernel upgrades.
Lead and coordinate planned upgrade cycles for core services like RDS, EKS, and Kubernetes clusters to ensure security and feature compatibility.
Troubleshoot and resolve infrastructure and application-level issues, collaborating directly with internal teams and business stakeholders.
Participate in customer support escalations and provide technical guidance for resolution.
Lead and refine incident management processes for the SRE team, ensuring minimal downtime and fast recovery.
Implement SLOs, SLIs, and error budgets to drive system reliability.
Conduct post-mortems and drive root cause analysis to prevent recurring issues.
Ensure cloud security best practices are embedded into all solutions, including IAM policies, VPC security, encryption, and compliance with industry standards (such as SOC 2, HIPAA).
Implement least privilege access, network segmentation, and automated security controls across AWS services.
Collaborate with InfoSec teams to enforce threat detection, logging, and security monitoring using AWS GuardDuty, Security Hub, and CloudTrail.
Design and build highly available, scalable, and fault-tolerant AWS architectures using AWS services such as EC2, S3, RDS, DocumentDB, Lambda, EKS, Secrets Manager, SSM, API Gateway, and CloudFront and other related technologies such as Hashicorp Terraform, Vault and Consul and Ansible (AWX)
Implement and support resilient storage, compute, and database solutions optimized for performance and cost.
Drive the execution of multi-region disaster recovery (DR) and backup strategies.
Continuously monitor and optimize AWS infrastructure costs using AWS Cost Explorer, Trusted Advisor, and Savings Plans/Reserved Instances.
Drive FinOps culture, ensuring teams design and deploy cost-efficient cloud solutions.
Implement auto-scaling, rightsizing strategies, and storage lifecycle policies to reduce costs.
5+ years of experience in SRE/DevOps roles in AWS.
Hands-on expertise with AWS services, including EC2, S3, Lambda, EKS, VPC, IAM, Secrets Manager, SSM and technologies such as Haschicorp Vault and Consul
Strong knowledge of cost optimization techniques in AWS, including autoscaling, right-sizing, storage lifecycle policies, and Reserved Instances/Savings Plans.
Strong hands-on experience with Infrastructure as Code (IaC) using Terraform, CloudFormation, or AWS CDK.
Proficiency in Linux Administration, Python, or Bash scripting for automation.
Experience with Kubernetes (EKS), Docker, and container orchestration.
Strong security and compliance knowledge, including IAM, security groups, encryption, AWS WAF, and logging with CloudTrail.
Hands-on experience with monitoring and observability tools like Prometheus, Grafana, AWS CloudWatch, Loki, and New Relic.
Experience in approving merge and pull requests, ensuring high-quality infrastructure code.
Strong team collaboration, documentation and communication skills.
Travel to and from company headquarters is required for mandatory onboarding and company meetings.
Preferred Qualifications
AWS Certifications (e.g., AWS Certified Solutions Architect - Professional, AWS Certified DevOps Engineer).
Experience with multi-account AWS organizations and AWS Control Tower.
Familiarity with service meshes (Istio, Linkerd) and API gateways.
Experience with Fortinet (FortiGate) firewalls and AWS networking (VPC, Transit Gateway, Direct Connect, etc.).
Background in database administration (PostgreSQL, MySQL, DocumentDB, or NoSQL databases).
Experience implementing resilience testing and chaos engineering.
Work on cutting-edge cloud technologies in a high-impact role.
Lead AWS cloud strategy and architecture, shaping the company’s infrastructure vision.
Be a mentor and leader, driving best practices in SRE and cloud engineering.
Optimize cloud costs, ensuring efficiency and scalability.
Collaborate with top engineering teams, influencing product and infrastructure decisions.
What we offer:
Comprehensive Healthcare: Fully covered medical, vision, and dental benefits for employees, plus generous dependent coverage.
Telehealth Services: Convenient access to telehealth services tailored for remote work.
Savings Accounts: Tax-advantaged savings accounts for healthcare and dependent care expenses.
Ancillary Benefits: Life, AD&D, and disability insurance paid by Iodine for peace of mind.
Retirement Plan: Competitive 401(k) retirement plan with a considerable company match.
Extra Life Insurance: Optional additional life insurance coverage for you and your dependents.
Accident Insurance: Financial protection against unexpected accidents and critical health issues.
Critical Illness Insurance: Provides financial support for medical costs and living expenses during serious illness.
Hospital Indemnity Insurance: Additional support for hospital-related expenses through indemnity insurance.
Pet Insurance: Affordable options for discounted pet insurance.
Legal and Identity Protection: Legal and ID theft protection to safeguard personal information.
Employee Assistance: Confidential employee assistance program for personal and professional challenges.
Education Allowance: Annual funding for educational pursuits and continuing education to support professional development and skill enhancement.
Reimbursements: Annual reimbursement for eligible wellness expenses, monthly reimbursement for cell phone and WiFi costs, and a one-time equipment allowance for creating a comfortable home office.
Why should you join Iodine?
This is a unique opportunity to join a close-knit, rapidly growing team and help us improve a key piece of the organization. You will have the opportunity to drive smarter healthcare processes through technology, so hospitals can stay focused on patient care. You will join a passionate and ambitious team, with a proven record of success building multiple companies. Learn more about our company culture on Built In Austin and on our website at www.iodinesoftware.com.
Loading similar jobs...
Discover fully remote job opportunities in the United States at USA Remote Jobs. Apply for roles like Software Developer, Customer Service Specialist, Project Manager, and more!