- •Senior Data Engineer with 10 years of experience designing, building, and optimizing scalable data platforms across startup and enterprise healthcare environments. Proven expertise in developing high-performance ETL pipelines, architecting cloud-native data lakes, and processing large-scale datasets using Python, SQL. and Apache Spark.
- •Extensive experience working in multi-cloud environments (AWS, Azure, and GCP), leading cloud migration initiatives, and implementing modern data warehousing solutions using Snowflake and BigQuery. Strong background in healthcare data engineering, including regulatory and quality reporting transformations, with a focus on data reliability, performance tuning, and cost optimization.
- •Recognized for driving architectural decisions, mentoring engineering teams, and delivering enterprise-grade data solutions that improve operational efficiency and enable advanced analytics.
Senior Data Engineer
Acentra Health•Architected enterprise-scale multi-cloud data platform supporting Medicaid and population health analytics across AWS, Azure, and GCP.
•Designed and optimized large-scale Spark (PySpark/Scala) pipelines processing multi-terabyte healthcare datasets (claims, clinical, provider data).
•Led migration of 200+ batch ETL pipelines into a unified cloud-native data lake architecture (S3, ADLS, GCS).
•Implemented CI/CD framework for data engineering workflows using GitHub Actions and Azure DevOps, reducing deployment time by 40%.
•Built reusable ETL framework adopted across multiple analytics teams, improving development velocity by 35%.
•Reduced Spark job runtime by 60% through partition tuning, cluster autoscaling strategies, and performance profiling.
•Designed near real-time ingestion pipelines using Kafka and Kinesis for clinical event analytics.
•Implemented enterprise data quality validation using Great Expectations to ensure regulatory compliance.
•Partnered with leadership and SMEs to define long-term data architecture roadmap.
•Mentored 3-5 data engineers and led architecture review sessions across teams.
•Technologies: Python, SQL, Scala, Spark (Databricks, EMR), Snowflake, BigQuery, AWS (S3, EMR, Glue, Redshift, Lambda), Azure (ADF, Synapse, ADLS), GCP (BigQuery, Dataflow, Cloud Storage),
Kafka, Terraform, Airflow
Data Engineer – Healthcare & Regulatory Data Systems
Parkland Health•Designed and maintained end-to-end healthcare data pipelines integrating EHR, claims, lab, and operational datasets.
•Built scalable PySpark jobs to transform patient-level data for regulatory reporting and quality metrics (including HEDIS-related transformations).
•Led migration of on-prem SQL Server workloads to Azure Data Factory and Azure Synapse.
•Developed Snowflake-based dimensional models (star schema) to support enterprise Bl and analytics reporting.
•Implemented automated data quality checks to ensure HIPAA-compliant data processing.
•Optimized batch workloads, reducing average processing time by 45%.
•Collaborated with clinical, compliance, and analytics teams to translate regulatory requirements into scalable data transformations.
•Supported hybrid-cloud architecture leveraging both AWS and Azure environments.
•Participated in architecture review boards and contributed to cloud modernization strategy.
•Mentored junior engineers and led knowledge-sharing sessions on Spark performance tuning.
•Technologies: Python, SQL, PySpark, Spark, Snowflake, Airflow, Azure (ADF, Synapse, ADLS), AWS (S3, Glue, Redshift), Kafka, Terraform
Junior Data Engineer
LawnStarter•Developed and maintained ETL pipelines for customer booking, pricing, and vendor datasets.
•Built Airflow DAGs to orchestrate batch workflows and improve reliability.
•Designed dimensional models in Amazon Redshift to support Bl reporting.
•Automated data validation scripts in Python to reduce reporting discrepancies by 30%.
•Collaborated with product and marketing teams to transform raw data into analytics-ready datasets.
•Improved AWS resource utilization and reduced monthly cloud costs through workload optimization.
•Technologies: Python, SQL, AWS (S3, Redshift, EC2, Lambda, RDS), Airflow, Tableau/ Looker, Git
Data Analyst
LawnStarter•Developed complex SQL queries and dashboards to track revenue, customer acquisition, churn, and operational KPIs.
•Built Tableau/Looker reports used by marketing and operations leadership for decision-making.
•Conducted A/B testing analysis to evaluate product and pricing strategies.
•Cleaned and transformed raw transactional data for reporting accuracy.
•Partnered with stakeholders to define reporting requirements and translate business needs into data insights.
•Automated recurring reporting workflows, reducing manual effort by 25%.
•Technologies: SQL, Excel, Tableau/Looker, Python (basic scripting), AWS (Redshift)
Bachelor's degree in Computer Science
The University of Texas at Austin•Python
•SQL (Advanced query optimization, window functions, CTES)
•Scala
•PySpark
•Bash (scripting)
•Snowflake
•Amazon Redshift
•BigQuery
•Dimensional Modeling (Star & Snowflake Schema)
•Data Lake Architecture
•Data Mart Design
•Terraform (Infrastructure as Code)
• CI/CD for Data Pipelines
• Git/GitHub
• Azure DevOps
Amazon Web Services (AWS):
•S3
•Redshift
•Glue
•EMR
•Lambda
•EC2
•RDS
Microsoft Azure:
•Azure Data Factory
•Azure Synapse
•Azure Data Lake Storage (ADLS)
•Azure Functions
Google Cloud Platform (GCP):
•BigQuery
•Cloud Storage
•Dataflow
• Apache Kafka
• AWS Kinesis
•ETL/ELT Pipeline Design
•Apache Airflow
•Azure Data Factory (ADF)
•Workflow automation
•Data pipeline monitoring & alerting
• Healthcare Claims Processing
• X12 Transactions
• HL7 and FHIR Standards
• Clinical Data Modeling
• CMS Regulatory Reporting
• HIPAA and PHI Governance
• ICD Coding Concepts
• Healthcare Data Interoperability
• Clinical and Financial Data Pipelines
• Great Expectations
• Data validation frameworks
• HIPAA-compliant data handling
•Healthcare data transformations
•Tableau
• Looker
• KPI Development
• A/B Testing Analytics
• Business Metrics Reporting
- •Apache Spark (batch & performance tuning)
• Databricks
• AWS EMR
• Azure Synapse Analytics
• GCP Dataflow