Machine Learning Operations (MLOps) Engineer

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Machine Learning Operations (MLOps) Engineer in the United States.

This role sits at the core of building and scaling a modern machine learning platform that powers production-grade AI systems. You will be responsible for designing and operating the infrastructure that enables seamless model training, deployment, and monitoring across high-impact products. Working at the intersection of software engineering, DevOps, and machine learning, you will help define how ML systems are built and operated at scale. This is a highly hands-on engineering role focused on reliability, performance, and automation of end-to-end ML workflows. You will collaborate closely with machine learning engineers and data teams to improve developer experience and accelerate delivery of AI-driven solutions. The environment is fast-paced, highly technical, and centered on building scalable systems that support real-world production AI use cases.

Accountabilities:

In this role, you will design, build, and maintain the infrastructure and tooling that supports the full machine learning lifecycle, from training and experimentation to deployment and monitoring in production environments.

Design and implement scalable ML infrastructure to support training, evaluation, deployment, and inference workflows
Develop and maintain containerized systems using Docker and Kubernetes for distributed and scalable workloads
Build and orchestrate distributed training pipelines and workflow automation systems
Implement and maintain ML lifecycle tools such as MLflow for experiment tracking, versioning, and reproducibility
Own and optimize production inference systems, including low-latency and high-availability model serving architectures
Develop and maintain CI/CD pipelines for machine learning models, including automated deployment, version control, and rollback strategies
Build and manage data pipelines integrated with platforms such as Snowflake and related data systems
Implement observability solutions including monitoring, logging, and alerting for model performance, drift detection, and system health
Collaborate closely with ML engineers to improve platform usability, reliability, and overall developer experience

Requirements

This role requires strong software engineering expertise combined with hands-on experience building and operating machine learning infrastructure at scale. The ideal candidate is highly technical, automation-driven, and comfortable working across distributed systems.

Bachelor’s or Master’s degree in Computer Science, Engineering, or equivalent practical experience
5+ years of experience in software engineering, DevOps, or MLOps roles
Strong proficiency in Python and experience building production-grade distributed systems
Hands-on experience with Docker, Kubernetes, and cloud-based infrastructure
Proven experience designing and maintaining CI/CD pipelines for production systems
Familiarity with ML lifecycle tools such as MLflow or equivalent platforms
Experience working with data platforms such as Snowflake or similar cloud data warehouses
Strong understanding of system design, microservices, APIs, and scalable architectures
Excellent debugging and troubleshooting skills across complex distributed environments
Strong collaboration skills and ability to work effectively with ML engineers and data teams

Benefits

Fully remote work opportunity
Unlimited vacation policy, sick time, and paid holidays
Comprehensive healthcare coverage including medical, dental, and vision plans
401(k) retirement savings plan
Paid parental leave and supportive time-off policies
Startup environment with strong focus on innovation and engineering impact
Opportunity to work on cutting-edge machine learning infrastructure at scale
Collaborative, engineering-driven culture focused on automation and continuous improvement.

How Jobgether works:

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

#LI-CL1

Machine Learning Operations (MLOps) Engineer

Requirements

Benefits

USA Remote Jobs