David AsencioSenior Data Engineer
Profile
- Senior Data Engineer with 10 years of experience designing, building, and optimizing scalable data platforms across startup and enterprise healthcare environments. Proven expertise in developing high-performance ETL pipelines, architecting cloud-native data lakes, and processing large-scale datasets using Python, SQL. and Apache Spark. - Extensive experience working in multi-cloud environments (AWS, Azure, and GCP), leading cloud migration initiatives, and implementing modern data warehousing solutions using Snowflake and BigQuery. Strong background in healthcare data engineering, including regulatory and quality reporting transformations, with a focus on data reliability, performance tuning, and cost optimization. - Recognized for driving architectural decisions, mentoring engineering teams, and delivering enterprise-grade data solutions that improve operational efficiency and enable advanced analytics.
Professional Experience

Senior Data Engineer

Acentra Health
07/2023 – 01/2026 | Mclean. Virginia
  • Architected enterprise-scale multi-cloud data platform supporting Medicaid and population health analytics across AWS, Azure, and GCP.
  • Designed and optimized large-scale Spark (PySpark/​Scala) pipelines processing multi-terabyte healthcare datasets (claims, clinical, provider data).
  • Led migration of 200+ batch ETL pipelines into a unified cloud-native data lake architecture (S3, ADLS, GCS).
  • Implemented CI/​CD framework for data engineering workflows using GitHub Actions and Azure DevOps, reducing deployment time by 40%.
  • Built reusable ETL framework adopted across multiple analytics teams, improving development velocity by 35%.
  • Reduced Spark job runtime by 60% through partition tuning, cluster autoscaling strategies, and performance profiling.
  • Designed near real-time ingestion pipelines using Kafka and Kinesis for clinical event analytics.
  • Implemented enterprise data quality validation using Great Expectations to ensure regulatory compliance.
  • Partnered with leadership and SMEs to define long-term data architecture roadmap.
  • Mentored 3-5 data engineers and led architecture review sessions across teams.
  • Technologies: Python, SQL, Scala, Spark (Databricks, EMR), Snowflake, BigQuery, AWS (S3, EMR, Glue, Redshift, Lambda), Azure (ADF, Synapse, ADLS), GCP (BigQuery, Dataflow, Cloud Storage), Kafka, Terraform, Airflow
  • Data Engineer

    Parkland Health
    01/2020 – 07/2023 | Dallas, Texas
  • Designed and maintained end-to-end healthcare data pipelines integrating EHR, claims, lab, and operational datasets.
  • Built scalable PySpark jobs to transform patient-level data for regulatory reporting and quality metrics (including HEDIS-related transformations).
  • Led migration of on-prem SQL Server workloads to Azure Data Factory and Azure Synapse.
  • Developed Snowflake-based dimensional models (star schema) to support enterprise Bl and analytics reporting.
  • Implemented automated data quality checks to ensure HIPAA-compliant data processing.
  • Optimized batch workloads, reducing average processing time by 45%.
  • Collaborated with clinical, compliance, and analytics teams to translate regulatory requirements into scalable data transformations.
  • Supported hybrid-cloud architecture leveraging both AWS and Azure environments.
  • Participated in architecture review boards and contributed to cloud modernization strategy.
  • Mentored junior engineers and led knowledge-sharing sessions on Spark performance tuning.
  • Technologies: Python, SQL, PySpark, Spark, Snowflake, Airflow, Azure (ADF, Synapse, ADLS), AWS (S3, Glue, Redshift), Kafka, Terraform
  • Junior Data Engineer

    LawnStarter
    01/2017 – 12/2019 | Austin, Texas
  • Developed and maintained ETL pipelines for customer booking, pricing, and vendor datasets.
  • Built Airflow DAGs to orchestrate batch workflows and improve reliability.
  • Designed dimensional models in Amazon Redshift to support Bl reporting.
  • Automated data validation scripts in Python to reduce reporting discrepancies by 30%.
  • Collaborated with product and marketing teams to transform raw data into analytics-ready datasets.
  • Improved AWS resource utilization and reduced monthly cloud costs through workload optimization.
  • Technologies: Python, SQL, AWS (S3, Redshift, EC2, Lambda, RDS), Airflow, Tableau/​ Looker, Git
  • Data Analyst

    LawnStarter
    02/2016 – 12/2016 | Austin, Texas
  • Developed complex SQL queries and dashboards to track revenue, customer acquisition, churn, and operational KPIs.
  • Built Tableau/​Looker reports used by marketing and operations leadership for decision-making.
  • Conducted A/​B testing analysis to evaluate product and pricing strategies.
  • Cleaned and transformed raw transactional data for reporting accuracy.
  • Partnered with stakeholders to define reporting requirements and translate business needs into data insights.
  • Automated recurring reporting workflows, reducing manual effort by 25%.
  • Technologies: SQL, Excel, Tableau/​Looker, Python (basic scripting), AWS (Redshift)
  • Education

    Bachelor's degree in Computer Science

    The University of Texas at Austin
    2011 – 2015 | Austin, Texas
    Skills
    Programming & Query Languages
    • Python

    •SQL (Advanced query optimization, window functions, CTES)

    •Scala

    •PySpark

    •Bash (scripting)

    Data Warehousing & Modeling
    • Snowflake

    •Amazon Redshift

    •BigQuery

    •Dimensional Modeling (Star & Snowflake Schema)

    •Data Lake Architecture

    •Data Mart Design

    DevOps & Infrastructure
    • Terraform (Infrastructure as Code)

    • CI/​CD for Data Pipelines

    • Git/​GitHub

    • Azure DevOps

    Cloud Platforms (Multi-Cloud)

    Amazon Web Services (AWS):

    •S3

    •Redshift

    •Glue

    •EMR

    •Lambda

    •EC2

    •RDS

    Microsoft Azure:

    •Azure Data Factory

    •Azure Synapse

    •Azure Data Lake Storage (ADLS)

    •Azure Functions

    Google Cloud Platform (GCP):

    •BigQuery

    •Cloud Storage

    •Dataflow

    Big Data & Distributed Processing
    • Apache Spark (batch & performance tuning)

    • Databricks

    • AWS EMR

    • Azure Synapse Analytics

    • GCP Dataflow

    Data Engineering & Orchestration
    • ETL/​ELT Pipeline Design

    •Apache Airflow

    •Azure Data Factory (ADF)

    •Workflow automation

    •Data pipeline monitoring & alerting

    Healthcare Data and Compliance
    • Healthcare Claims Processing

    • X12 Transactions

    • HL7 and FHIR Standards

    • Clinical Data Modeling

    • CMS Regulatory Reporting

    • HIPAA and PHI Governance

    • ICD Coding Concepts

    • Healthcare Data Interoperability

    • Clinical and Financial Data Pipelines

    Data Quality & Governance
    • Great Expectations

    • Data validation frameworks

    • HIPAA-compliant data handling

    • Healthcare data transformations
    Streaming & Real-Time Processing
    • Apache Kafka

    • AWS Kinesis

    BI & Analytics
    • Tableau

    • Looker

    • KPI Development

    • A/​B Testing Analytics

    • Business Metrics Reporting