Member of Technical Staff - ML Research Engineer; Multi-Modal - Vision

About Liquid AI

Spun out of MIT CSAIL, we build general-purpose AI systems that run efficiently across deployment targets, from data center accelerators to on-device hardware, ensuring low latency, minimal memory usage, privacy, and reliability. We partner with enterprises across consumer electronics, automotive, life sciences, and financial services. We are scaling rapidly and need exceptional people to help us get there.

The Opportunity

Our VLM team builds vision-language models that run on-device, at the edge, and under real-time constraints without sacrificing quality. This role offers full technical ownership for someone who wants to own outcomes, make decisions, and shape the direction of vision AI at a company where your work is the product.

What We're Looking For

We need someone who:

Has expertise in VLMs: This role hits the ground running. You'll tackle real problems from day one.
Takes ownership: We give people problems, not tasks. We need someone who will own an end-to-end workstream and deliver outcomes.
Writes production code: Our models ship to customers. We need code that's maintained, not one-off research prototypes.
Stays resilient: Training runs fail. Experiments don't work. We need someone who iterates through setbacks.

The Work

Design and run large-scale VLM training experiments on distributed GPU clusters
Own pre-training or SFT pipelines for multimodal models
Build data pipelines for image-text datasets at scale
Collaborate on vision encoder architecture and image compression tradeoffs
Help grow the team through interviewing and network referrals

Desired Experience

Must-have:

Direct VLM experience (training, architecture, or significant research)
Distributed training at scale (PyTorch Distributed, DeepSpeed, FSDP, or Megatron-LM)
Production-quality coding ability
Can work independently

Nice-to-have:

Video understanding experience
Data quality or dataset design expertise
Vision encoder or image compression research

What Success Looks Like (Year One)

Our VLM models are SOTA across all major benchmarks
This hire owns a major workstream (video understanding, data quality, or encoder architecture) end-to-end
At least one model has shipped to production with this hire's direct contribution

What We Offer

Full ownership: You own your work from architecture to deployment.
Compensation: Competitive base salary with equity in a unicorn-stage company
Health: We pay 100% of medical, dental, and vision premiums for employees and dependents
Financial: 401(k) matching up to 4% of base pay
Time Off: Unlimited PTO plus company-wide Refill Days throughout the year