3 years building production ETL/ELT pipelines | 2 Microsoft Azure certifications | Processed 10,000+ hours of multimedia data | Mentoring 20+ aspiring data engineers

Profile

Data Engineer with 3+ years of experience building production ETL/ELT pipelines and AI-powered data systems across Azure and AWS. Proven track record designing data warehouses with dimensional modeling, automating data ingestion from hybrid sources, and optimizing large-scale multimedia processing pipelines. Specialized in healthcare data systems with hands-on expertise in vector databases, semantic search, and generative AI integration.

Certificates
  • Microsoft Certified: Fabric Data Engineer Associate (DP-700) – Microsoft, October 2025
  • Microsoft Certified: Azure Data Engineer Associate (DP-203)Microsoft, September 2024
Skills

Programming & Databases: Python, SQL, PySpark, SQL Server, PostgreSQL, MySQL

Cloud & Big Data: Azure Data Factory, Microsoft Fabric, Databricks, AWS (S3, EC2), Azure Storage

Data Architecture: Data Warehousing, Dimensional Modeling (Star/Snowflake), SCD Type 2, ETL/ELT Pipelines

Analytics & Tools: Power BI, Git, Data Quality Validation, Pipeline Monitoring

Professional Experience
  • Optimized AI-powered speaker diarization pipeline processing 10,000+ hours of healthcare video on AWS (EC2, S3), reducing processing time from 17 minutes to under 7 minutes per 2-hour video (60% improvement), enabling client delivery within 2 weeks vs 2 months, a 75% reduction in turnaround time.
  • Engineered 5 production ETL/ELT pipelines using Azure Data Factory and Python, processing 100K+ daily records from hybrid SQL Server and AWS S3 sources with automated scheduling and monitoring
  • Architected enterprise data warehouse with Star schema design including fact tables and dimension tables with SCD Type 2 implementation, supporting millions of patient records for healthcare analytics and AI model training
  • Built end-to-end AI data pipeline automating video transcription (Whisper), speaker identification (Pyannote), text embedding generation (Sentence-Transformers), and FAISS vector database implementation for semantic search, serving Canadian healthcare clients
  • Implemented data quality frameworks using SQL queries and Azure Data Factory dataflows to validate null values, row counts, duplicates, and data completeness, ensuring pipeline reliability through automated validation and monitoring
  • Built time-series forecasting models achieving 92% accuracy across 500+ SKUs, enabling proactive resource planning and reducing material waste by 18%
  • Deployed predictive models to production serving 15+ stakeholders across operations, planning, and procurement teams for real-time decision support
  • Automated data preprocessing pipelines, reducing manual data preparation effort from 8 hours to 4 hours weekly, accelerating model iteration cycles by 50%
  • MENTORSHIP & COMMUNITY
    07/2025 – Present
  • Mentor cohort of 20 aspiring data engineers through hands-on curriculum covering PySpark optimization, distributed processing, dimensional modeling (Star schema, SCD Type 2), and semantic layer design
  • Guide students in building end-to-end portfolio projects involving data ingestion, transformation orchestration, and data warehouse implementation, with 85% completing certification-ready projects.
  • Education