Profile

Data engineer focused on end‑to‑end ETL/​ ELT pipelines and cloud data platforms on AWS, Snowflake, and PostgreSQL using Python, SQL, and Airflow to deliver reliable, governed, analytics‑ready datasets. Partner with cross‑functional stakeholders to design audited, AI‑enhanced data products with embedded data quality checks, monitoring, and documentation that improve data accuracy and reduce manual review.

Education
University at Buffalo, Master’s in Data Science (STEM)
08/2024 – 12/2025 | Buffalo,NY
Professional Experience

Data Engineer, Troy & Bank⁠ 08/​2025 – Present | Buffalo, USA

  • Built an end-to-end utility bill ETL and audit platform using Python, Apache Airflow, n8n, PostgreSQL (hosted on Heroku), AWS S3, and LLM agents to ingest, normalize, and validate high-volume billing data.
  • Integrated cloud storage (AWS S3) for scalable data ingestion and archival, improving data accessibility and versioning control for audit workflows.
  • Designed pipelines that cut manual review time by 60%, scaled analysis to 1,000+ bills/​ month, and improved dataquality for downstream BI and reporting.
  • Developed a Streamlit-based audit console and modular DAGs, enabling configuration-based onboarding for newutilities, tariffs, and audit rules without code rewrites.
  • Collaborated with auditors and business leaders to refine data models, thresholds, and exception categories, aligningaudit logic with real-world billing workflows.
  • Software Developer Engineer, Reliance Jio Platforms LTD⁠ 09/​2022 – 07/​2024 | New Mumbai, India

  • Led end‑to‑end development of a data platform (.NET, Python, MySQL, Camunda) that automated the FTTX deployment lifecycle and eliminated spreadsheet‑based workflows.
  • Designed and operated ETL jobs ingesting and cleaning 100K+ daily GIS and log records into a centralized MySQLstore for reporting and downstream tools.
  • Optimized schemas and queries for analytics workloads, improving query performance by 20%, and exposed structured deployment and financial data via RESTful APIs while collaborating with planning, finance, and field Teams to define critical fields, validations, and reports that remove deployment and cost bottlenecks
  • Software Developer Intern, Incerro.ai(CodeBin)⁠ 04/​2022 – 09/​2022 | Pune, India

  • Built AI-driven web apps and integrated REST APIs with backend data/​AI services to create ETL dashboards, tables, and filters while collaborating on data validation and reliable analytics flows, and refactored reusable components to translate AI-first UX into production e-commerce/​SaaS interfaces, testing data features end-to-end to support accurate pipeline visualization and reporting.
  • Skills
    • Programming & Scripting: Python, SQL (advanced functions), C#, R, Bash.
    • Data Engineering & Analytics: ETL/​​ELT, dimensional data modeling, dbt, Apache Airflow, Apache Spark, Kafka, PySpark, data quality, data warehousing, analytics reporting.
    • Data Platforms: Snowflake, PostgreSQL, MySQL, SQL Server, cloud data warehouses.
    • Cloud & DevOps: AWS (S3, EC2, RDS), Azure, Docker, Terraform, CI/​​CD pipelines, Git/​​GitHub.
    • Analytics & BI: Power BI, Tableau, Excel (advanced formulas, charts), SQL‑based reporting, and dashboards.
    • Development Tools & Apps: FastAPI, .NET, REST APIs, Streamlit, Jira, SharePoint.
    • Professional Skills: Cross‑functional collaboration, stakeholder communication, requirement gathering, problem‑solving, documentation, and ownership.
    Projects
    EV Charging Data Warehouse⁠, (Snowflake Cloud Data Warehouse & ELT)
  • Architected a cloud-native data warehouse on Snowflake using dbt for dimensional modeling (Star Schema) to centralize station and usage data into analytics-ready tables, and implemented a custom data quality engine to automatically flag nulls and schema deviations, ensuring ongoing data accuracy, reliability, and health reporting.
  • Real‑Time User Analytics Pipeline⁠, (Real-time User Analytics & Streaming)
  • Engineered a real-time streaming pipeline using Apache Kafka and PySpark to process e-commerce event data for live funnel monitoring and churn risk identification, implementing windowing and watermarking in Spark Streaming to handle late-arriving data and deliver sub-second visibility into user conversion bottlenecks at scale.​
  • Utility Billing AI Auditor⁠, (LLM‑powered multi‑agent audit system )
  • Built an agentic LLM-powered auditor using Python/​Airflow/​AWS for PDF ingestion, tariff rule extraction with LLMs, bill recalculation, and anomaly detection, automating 24-hour manual audits to minutes by generating reports and discrepancy alerts and Dockerizing the solution with a Streamlit UI for non-technical users to improve accuracy and efficiency.