David Asencio – Online Resume

David AsencioSenior Data Engineer
Email
[email protected] Phone
+14696879725 Location
Katy, Texas LinkedIn
linkedin.com/in/asencio-reyes-david-b2423b3b6 
https://v0-healthcare-portfolio-dun.vercel.app/ 

Professional Summary

•Senior Data Engineer with 10 years of experience designing, building, and optimizing scalable data platforms across startup and enterprise healthcare environments. Proven expertise in developing high-performance ETL pipelines, architecting cloud-native data lakes, and processing large-scale datasets using Python, SQL. and Apache Spark.
•Extensive experience working in multi-cloud environments (AWS, Azure, and GCP), leading cloud migration initiatives, and implementing modern data warehousing solutions using Snowflake and BigQuery. Strong background in healthcare data engineering, including regulatory and quality reporting transformations, with a focus on data reliability, performance tuning, and cost optimization.
•Recognized for driving architectural decisions, mentoring engineering teams, and delivering enterprise-grade data solutions that improve operational efficiency and enable advanced analytics.

Professional Experience

Senior Data Engineer

Acentra Health

07/2023 – 01/2026 | Mclean. Virginia

•Architected enterprise-scale multi-cloud data platform supporting Medicaid and population health analytics across AWS, Azure, and GCP.

•Designed and optimized large-scale Spark (PySpark/​Scala) pipelines processing multi-terabyte healthcare datasets (claims, clinical, provider data).

•Led migration of 200+ batch ETL pipelines into a unified cloud-native data lake architecture (S3, ADLS, GCS).

•Implemented CI/​CD framework for data engineering workflows using GitHub Actions and Azure DevOps, reducing deployment time by 40%.

•Built reusable ETL framework adopted across multiple analytics teams, improving development velocity by 35%.

•Reduced Spark job runtime by 60% through partition tuning, cluster autoscaling strategies, and performance profiling.

•Designed near real-time ingestion pipelines using Kafka and Kinesis for clinical event analytics.

•Implemented enterprise data quality validation using Great Expectations to ensure regulatory compliance.

•Partnered with leadership and SMEs to define long-term data architecture roadmap.

•Mentored 3-5 data engineers and led architecture review sessions across teams.

•Technologies: Python, SQL, Scala, Spark (Databricks, EMR), Snowflake, BigQuery, AWS (S3, EMR, Glue, Redshift, Lambda), Azure (ADF, Synapse, ADLS), GCP (BigQuery, Dataflow, Cloud Storage),

Kafka, Terraform, Airflow

Data Engineer – Healthcare & Regulatory Data Systems

Parkland Health

01/2020 – 07/2023 | Dallas, Texas

•Designed and maintained end-to-end healthcare data pipelines integrating EHR, claims, lab, and operational datasets.

•Built scalable PySpark jobs to transform patient-level data for regulatory reporting and quality metrics (including HEDIS-related transformations).

•Led migration of on-prem SQL Server workloads to Azure Data Factory and Azure Synapse.

•Developed Snowflake-based dimensional models (star schema) to support enterprise Bl and analytics reporting.

•Implemented automated data quality checks to ensure HIPAA-compliant data processing.

•Optimized batch workloads, reducing average processing time by 45%.

•Collaborated with clinical, compliance, and analytics teams to translate regulatory requirements into scalable data transformations.

•Supported hybrid-cloud architecture leveraging both AWS and Azure environments.

•Participated in architecture review boards and contributed to cloud modernization strategy.

•Mentored junior engineers and led knowledge-sharing sessions on Spark performance tuning.

•Technologies: Python, SQL, PySpark, Spark, Snowflake, Airflow, Azure (ADF, Synapse, ADLS), AWS (S3, Glue, Redshift), Kafka, Terraform

Junior Data Engineer

LawnStarter

01/2017 – 12/2019 | Austin, Texas

•Developed and maintained ETL pipelines for customer booking, pricing, and vendor datasets.

•Built Airflow DAGs to orchestrate batch workflows and improve reliability.

•Designed dimensional models in Amazon Redshift to support Bl reporting.

•Automated data validation scripts in Python to reduce reporting discrepancies by 30%.

•Collaborated with product and marketing teams to transform raw data into analytics-ready datasets.

•Improved AWS resource utilization and reduced monthly cloud costs through workload optimization.

•Technologies: Python, SQL, AWS (S3, Redshift, EC2, Lambda, RDS), Airflow, Tableau/​ Looker, Git

Data Analyst

LawnStarter

02/2016 – 12/2016 | Austin, Texas

•Developed complex SQL queries and dashboards to track revenue, customer acquisition, churn, and operational KPIs.

•Built Tableau/​Looker reports used by marketing and operations leadership for decision-making.

•Conducted A/​B testing analysis to evaluate product and pricing strategies.

•Cleaned and transformed raw transactional data for reporting accuracy.

•Partnered with stakeholders to define reporting requirements and translate business needs into data insights.

•Automated recurring reporting workflows, reducing manual effort by 25%.

•Technologies: SQL, Excel, Tableau/​Looker, Python (basic scripting), AWS (Redshift)

Education

Bachelor's degree in Computer Science

The University of Texas at Austin

2011 – 2015 | Austin, Texas

Skills

Programming & Query Languages•Python
•SQL (Advanced query optimization, window functions, CTES)
•Scala
•PySpark
•Bash (scripting)
Data Warehousing & Modeling•Snowflake
•Amazon Redshift
•BigQuery
•Dimensional Modeling (Star & Snowflake Schema)
•Data Lake Architecture
•Data Mart Design
DevOps & Infrastructure•Terraform (Infrastructure as Code)
• CI/​CD for Data Pipelines
• Git/​GitHub
• Azure DevOps
Cloud Platforms (Multi-Cloud)Amazon Web Services (AWS):
•S3
•Redshift
•Glue
•EMR
•Lambda
•EC2
•RDS
Microsoft Azure:
•Azure Data Factory
•Azure Synapse
•Azure Data Lake Storage (ADLS)
•Azure Functions
Google Cloud Platform (GCP):
•BigQuery
•Cloud Storage
•Dataflow
Streaming & Real-Time Processing• Apache Kafka
• AWS Kinesis
Data Engineering & Orchestration•ETL/​ELT Pipeline Design
•Apache Airflow
•Azure Data Factory (ADF)
•Workflow automation
•Data pipeline monitoring & alerting
Healthcare Data and Compliance• Healthcare Claims Processing
• X12 Transactions
• HL7 and FHIR Standards
• Clinical Data Modeling
• CMS Regulatory Reporting
• HIPAA and PHI Governance
• ICD Coding Concepts
• Healthcare Data Interoperability
• Clinical and Financial Data Pipelines
Data Quality & Governance• Great Expectations
• Data validation frameworks
• HIPAA-compliant data handling
•Healthcare data transformations
BI & Analytics•Tableau
• Looker
• KPI Development
• A/​B Testing Analytics
• Business Metrics Reporting
Big Data & Distributed Processing
•Apache Spark (batch & performance tuning)
• Databricks
• AWS EMR
• Azure Synapse Analytics
• GCP Dataflow