Syed Muhammad MehmamData Engineer
Profile

Data professional with 3 years of experience, 2+ years in building and optimizing data pipelines, ETL/ELT processes, and cloud data platforms. Skilled in SQL, Python, PySpark, Data Warehousing, Power BI and Microsoft Fabric for delivering scalable data solutions.

Professional Experience
  • Designed, built, and maintained scalable ETL pipelines using SQL Server, Databricks, PySpark, and Python, handling structured datasets across healthcare, retail, and multimedia.
  • Supported data ingestion and integration from multiple source systems into Azure and AWS cloud platforms.
  • Developed and optimized data warehouses using star schema, fact/dimension tables, and SCD Type 2, improving reporting accuracy and historical analysis.
  • Applied data quality checks, validation frameworks, and monitoring to ensure accuracy, consistency, and reliability.
  • Collaborated with cross-functional teams on Generative AI initiatives, integrating speech-to-text, summarization, and chatbot pipelines with large-scale video/audio processing.
  • Automated speaker diarization and transcription pipelines for 10,000+ hours of video, deploying on AWS EC2 with S3 integration (3x faster processing).
  • Built and deployed time-series forecasting models with 92% accuracy for resource planning and cost prediction.
  • Integrated predictive models into internal systems for real-time decision support.
  • Automated data cleaning and preprocessing pipelines, cutting manual workload by 50%.
  • Education
    Skills
    • Languages: Python, SQL
    • Databases: Microsoft SQL Server, MySQL, PostgreSQL
    • ETL/Big Data Tools: PySpark, Microsoft Fabric, AWS S3, Azure Storage, Azure Data Factory, Power BI
    • Concepts: Data Warehousing, Data Modeling, ETL/ELT, Data Quality, Monitoring & Logging, Version Control (Git).