This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Software Platform Support Engineer - GPU Cloud in United States.
This role sits at the intersection of cloud infrastructure, high-performance computing, and customer support, focusing on enabling seamless experiences across advanced GPU-based platforms. You will work closely with internal engineering, SRE, and product teams to support large-scale distributed systems powering AI and accelerated computing workloads. Acting as a technical bridge between users and platform engineering, you will troubleshoot complex issues, improve system reliability, and enhance operational processes in a fast-moving environment. The role requires deep curiosity about how systems work end-to-end, from compute and storage to networking layers. You will also contribute to building internal tools, documentation, and workflows that improve support efficiency and platform observability. This is a highly collaborative, high-impact position within a cutting-edge AI and GPU cloud ecosystem.
Accountabilities:- Provide Tier 1 support for complex GPU cloud platforms in collaboration with internal engineering, SRE, and infrastructure teams.
- Diagnose, investigate, and resolve customer and system issues while performing root cause analysis and escalating when needed.
- Partner with Site Reliability Engineering teams to file bugs, track incidents, and ensure timely resolution of platform issues.
- Develop and improve operational workflows, including runbooks, escalation paths, and support documentation.
- Build internal tooling and automation to enhance support efficiency, visibility, and issue resolution speed.
- Analyze user workloads and system behavior to better understand usage patterns and optimize platform performance.
- Collaborate with engineering teams to provide feedback, identify improvements, and contribute to platform enhancements.
- Participate in on-call rotations to support production systems and ensure platform reliability.
Requirements:
- Bachelor’s or Master’s degree in Computer Science or a related field, or equivalent practical experience.
- 2+ years of experience supporting distributed systems, cloud platforms, or end-user software environments.
- Strong experience with Linux-based systems and troubleshooting complex infrastructure issues.
- Hands-on knowledge of cloud platforms such as AWS, Azure, GCP, or OCI.
- Understanding of infrastructure components including networking, storage, and DevOps tooling/scripting.
- Familiarity with data storage systems such as databases, file, block, and object storage.
- Strong troubleshooting skills with the ability to analyze issues across multiple system layers.
- Excellent communication skills and customer-focused mindset for supporting internal users.
- Ability to work across teams and adapt to different layers of the technology stack.
- Bonus: Experience with MLOps, GPU workloads, distributed training systems, or HPC environments (e.g., SLURM).
- Strong organizational skills and a continuous improvement mindset.
Benefits:
- Competitive base salary ranging from $76,000 to $172,500 depending on level, experience, and location.
- Equity opportunities as part of the total compensation package.
- Comprehensive health, dental, and vision insurance coverage.
- Retirement savings plans and additional financial wellness benefits.
- Flexible work environment with opportunities for collaboration across global teams.
- Exposure to cutting-edge AI, GPU cloud, and high-performance computing technologies.
- Professional growth opportunities in a world-leading AI and accelerated computing organization.
- Inclusive and innovative work culture focused on impact, learning, and technical excellence.
How Jobgether works:
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Why Apply Through Jobgether?
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
#LI-CL1