As a Data Engineer, you’ll design, build, and maintain the data pipelines that power our deep learning and LLM systems. You’ll work across ingestion, transformation, and orchestration layers — from real-time feeds to analytics-ready datasets.
Your mission is to make data reliable, discoverable, and scalable for use by model training, analytics, and AI-driven products across multiple sports. You’ll collaborate closely with our MLOps, LLMOps, and Sports Data teams to ensure seamless integration between data and AI.
Responsibilities:
- Build and operate robust data pipelines for ingestion, cleaning, and transformation using Databricks, Airflow, or Dagster.
- Develop efficient ETL/ELT workflows in Python and SQL to support both batch and streaming workloads.
- Collaborate with ML and AI teams to deliver high-quality datasets for training, evaluation, and production features.
- Model and maintain structured data assets (Delta, Parquet, Iceberg) for reliability, versioning, and lineage tracking.
- Implement orchestration and monitoring — schedule jobs, track dependencies, and automate recovery from failures.
- Ensure data quality and compliance through validation frameworks, schema enforcement, and audit logging.
- Contribute to data platform evolution — evaluate tools, standardize best practices, and improve developer experience.
- Support performance and cost optimization across compute, storage, and orchestration systems.
Qualifications:
- 3–6 years of experience as a Data Engineer or ETL Developer in a production environment.
- Proficiency in Python and SQL; strong familiarity with Databricks, Spark, or equivalent big-data frameworks.
- Experience with workflow orchestration tools such as Airflow, Dagster, Luigi or Prefect.
- Deep understanding of data modeling, data warehousing, and distributed data processing.
- Knowledge of modern data lakehouse architectures (Delta, Parquet, Iceberg).
- Familiarity with CI/CD, GitHub Actions, and data pipeline testing frameworks.
- Comfort working in a cross-functional environment with ML, product, and analytics teams.
Nice to Have:
- Experience with sports, telemetry, or sensor data pipelines.
- Familiarity with streaming frameworks (Kafka, Spark Structured Streaming, Flink).
- General knowledge of American football, the NFL, and college football
- Background in data governance, lineage, and observability tools (Monte Carlo, Great Expectations, Unity Catalog, OpenLineage).
- Experience with cloud infrastructure (AWS, GCP, or Azure) and containerization (Docker, Kubernetes).
- Exposure to best practices in machine-learning model management and MLOps
Benefits:
- Competitive Salary and Bonus Plan
- Comprehensive health insurance plan
- Retirement savings plan (401k) with company match
- Remote working environment
- A flexible, unlimited time off policy
- Generous paid holiday schedule - 13 in total including Monday after the Super Bowl