Member of Technical Staff - ML Research Engineer; Multi-Modal - Vision

  1. Home
  2. Remote jobs
  3. Architecture
  • Company Liquid AI
  • Employment Full-time
  • Location 🇺🇸 United States nationwide
  • Submitted Posted 3 months ago - Updated 1 week ago

About Liquid AI

Spun out of MIT CSAIL, we build general-purpose AI systems that run efficiently across deployment targets, from data center accelerators to on-device hardware, ensuring low latency, minimal memory usage, privacy, and reliability. We partner with enterprises across consumer electronics, automotive, life sciences, and financial services. We are scaling rapidly and need exceptional people to help us get there.

The Opportunity

Our VLM team builds vision-language models that run on-device, at the edge, and under real-time constraints without sacrificing quality. This role offers full technical ownership for someone who wants to own outcomes, make decisions, and shape the direction of vision AI at a company where your work is the product.

What We're Looking For

We need someone who:

  • Has expertise in VLMs: This role hits the ground running. You'll tackle real problems from day one.

  • Takes ownership: We give people problems, not tasks. We need someone who will own an end-to-end workstream and deliver outcomes.

  • Writes production code: Our models ship to customers. We need code that's maintained, not one-off research prototypes.

  • Stays resilient: Training runs fail. Experiments don't work. We need someone who iterates through setbacks.

The Work

  • Design and run large-scale VLM training experiments on distributed GPU clusters

  • Own pre-training or SFT pipelines for multimodal models

  • Build data pipelines for image-text datasets at scale

  • Collaborate on vision encoder architecture and image compression tradeoffs

  • Help grow the team through interviewing and network referrals

Desired Experience

Must-have:

  • Direct VLM experience (training, architecture, or significant research)

  • Distributed training at scale (PyTorch Distributed, DeepSpeed, FSDP, or Megatron-LM)

  • Production-quality coding ability

  • Can work independently

Nice-to-have:

  • Video understanding experience

  • Data quality or dataset design expertise

  • Vision encoder or image compression research

What Success Looks Like (Year One)

  • Our VLM models are SOTA across all major benchmarks

  • This hire owns a major workstream (video understanding, data quality, or encoder architecture) end-to-end

  • At least one model has shipped to production with this hire's direct contribution

What We Offer

  • Full ownership: You own your work from architecture to deployment.

  • Compensation: Competitive base salary with equity in a unicorn-stage company

  • Health: We pay 100% of medical, dental, and vision premiums for employees and dependents

  • Financial: 401(k) matching up to 4% of base pay

  • Time Off: Unlimited PTO plus company-wide Refill Days throughout the year

Loading similar jobs...

USA Remote Jobs

Discover fully remote job opportunities in the United States at USA Remote Jobs. Apply for roles like Software Developer, Customer Service Specialist, Project Manager, and more!

© 2026 Created by USA Remote Jobs. All rights reserved.