Senior Data Engineer

About the Role

As a Data Engineer at Rohirrim, you’ll design, build, and optimize the data pipelines and infrastructure that fuel our AI products. You’ll work closely with our AI/ML teams, product teams, customer success managers,and security/compliance partners to transform complex enterprise datasets into clean, reliable, structured foundations for Rohan deployments — especially in controlled, secure, or GovTech environments.

You’ll help us scale:

ingestion pipelines
vector stores
embedding workflows
metadata & document-processing frameworks
Azure-native data services

…in a way that is fast, compliant, and deeply reliable.

What You’ll Do

Blend capabilities in software engineering, data engineering and devops to build and maintain scalable data ingestion pipelines for structured/unstructured data (documents, PDFs, knowledge bases, enterprise systems, APIs, etc.).
Develop and operate ETL/ELT workflows that ensure data integrity, security, and lineage.
Implement and optimize vector database systems and embeddings pipelines supporting RAG and AI features.
Collaborate with ML engineers to support model training, evaluation, and feature engineering pipelines.
Architect and manage Azure-based data infrastructure (e.g., Azure Functions, Azure Storage, Azure SQL, Azure Kubernetes Service, Azure OpenAI integrations).
Build internal tools for metadata extraction, OCR/document parsing, text normalization, and validation.
Ensure pipelines meet compliance, auditability, and security requirements (SOC2, FedRAMP, etc.).
Support customer-specific data onboarding workflows for government + enterprise deployments.
Monitor and improve pipeline performance, reliability, and scalability.

What Makes You a Great Fit

10+ years in Data Engineering, Software Engineering, or ML/Data Infrastructure roles.
Strong experience with Python, SQL, and modern data engineering tools (Airflow, Dagster, dbt, Prefect, etc.).
Experience building large-scale document extraction ETL pipelines (OCR, PDF parsing, metadata extraction, NLP preprocessing).
Proficiency with Kubernetes, Docker, and containerized data pipelines deployed on Azure, AWS and/or Google Cloud
Hands-on experience with relational databases (Postgres, SQL Server, MySQL) and non-relational systems such as Elasticsearch, Redis, and graph databases
Experience with document-heavy or text-heavy data processing (OCR, parsing, NLP preprocessing).
Strong data quality, governance, lineage, and validation mindset.
Excellent communicator who can align with ML, engineering, and product teams.

Bonus Skills

Experience building or supporting GenAI / LLM / RAG pipelines.
Experience with Azure OpenAI Service.
Experience with min.io
Background with knowledge graphs, semantic search, or indexing at scale.
Familiarity with CI/CD pipelines in Azure DevOps, GitHub Actions, or similar.

About the Role

What You’ll Do

What Makes You a Great Fit

Bonus Skills

USA Remote Jobs