Engineering Expert (PhD) - AI Systems Evaluation

This role is for one of our clients

Compensation: $73.29 per hour

PhD-level engineers are sought to support high-impact collaborations with advanced AI research teams. This role focuses on improving the accuracy, rigor, and reliability of general-purpose conversational AI systems, particularly in engineering-related contexts.

AI systems used in professional engineering scenarios must demonstrate strong applied reasoning, quantitative accuracy, and alignment with real-world systems. This project centers on evaluating and enhancing how models interpret, reason about, and explain engineering concepts across multiple disciplines.

Requirements

Key Responsibilities

Develop and refine prompts to guide AI behavior in engineering-specific scenarios
Evaluate model-generated responses for technical correctness, applied reasoning, completeness, and practical relevance
Fact-check technical claims using authoritative public sources and domain expertise
Annotate outputs by identifying conceptual gaps, flawed assumptions, and factual inaccuracies
Assess clarity, structure, and appropriateness of explanations for various audiences
Ensure responses align with expected conversational standards and system-level guidelines
Apply structured evaluation frameworks, taxonomies, and benchmarking standards consistently

Required Qualifications

PhD in Engineering or a closely related field
Deep expertise in one or more of the following domains:

Mechanical & Physical Systems Engineering
Electrical, Electronic & Computer Engineering
Chemical, Materials & Process Engineering
Civil, Environmental & Infrastructure Engineering

Strong familiarity with large language models (LLMs) and their practical applications
Excellent written communication skills with the ability to clearly explain complex technical concepts
High attention to detail and ability to detect subtle technical inaccuracies
Experience reviewing, editing, or critiquing technical or academic writing

Preferred Experience

Applied research, industry engineering workflows, or systems design
Experience with reinforcement learning from human feedback (RLHF), model evaluation, or structured data annotation
Teaching, mentoring, or explaining engineering concepts to non-expert audiences
Familiarity with structured evaluation rubrics, benchmarks, or quality assurance frameworks

What Success Looks Like

You consistently identify technical inaccuracies, incomplete reasoning, or flawed assumptions in engineering-related AI outputs
Your structured feedback measurably improves the rigor, clarity, and correctness of model responses
You produce consistent, reproducible evaluation artifacts that strengthen model performance over time
Engineering-focused AI systems demonstrate greater reliability and trustworthiness as a result of your evaluations

Contract & Payment Terms