Profile

Data Engineer with 3+ years of experience building scalable ETL/​ELT pipelines and cloud data platforms on AWS, Snowflake, and PostgreSQL using Python, SQL, Airflow, and Spark. Skilled in automating unstructured data extraction and audits with AI/​LLM tools to deliver analytics-ready, governed data products for business insights.

Education
University at Buffalo, Master’s in Data Science (STEM)
2025
Pune University, bachelor’s in ENTC
2022
Professional Experience

AI Data Engineer, Troy & Bank⁠ Aug 2025 – Present

  • Architected end-to-end ETL platform using Python, Airflow, and PostgreSQL, ingesting 1,000+ bills/​month from unstructured PDFs into analytics-ready tables with SCD Type 2 versioning, reducing manual audit effort by ~60%.
  • Integrated LLM agents (OpenAI, Anthropic) for automated tariff code extraction from 600+ pages of regulatory documents, delivering a Streamlit console enabling non-technical auditors to operate independently.
  • Engineered multi-agent lead intelligence platform using LangChain and LangGraph, automating prospect discovery, fit scoring, and outreach sequencing, replacing 100% of manual sales prospecting workflows.
  • Implemented Prometheus and Grafana observability dashboards tracking latency, API costs, and lead yield per source, providing sales and operations stakeholders real-time visibility into campaign performance.
  • Containerized all services via Docker, provisioned AWS infrastructure (EC2, S3, IAM) using Terraform, and redesigned monolith into modular FastAPI architecture achieving ~50% latency improvement for analytics and reporting teams.
  • Software Developer Engineer, Reliance Jio Platforms LTD⁠ Sep 2022 – Aug 2024

  • Spearheaded data platform development using .NET, Python, MySQL, and Camunda, automating FTTX deployment lifecycle and eliminating spreadsheet-based workflows across 3+ cross-functional teams.
  • Designed ETL jobs ingesting 100K+ daily GIS and log records into centralized MySQL store, optimizing schemas and analytical queries to achieve 20% performance improvement for planning and finance reporting teams.
  • Engineered billing, material management, and role-based user management modules integrated with SAP APIs, exposing structured deployment and financial data via RESTful APIs to support cross-functional decision-making across planning, finance, and field teams.
  • Managed CI/​CD pipelines across dev/​test/​pre-prod/​prod via Azure DevOps, collaborating with stakeholders to gather requirements, define validations, and deliver analytical reports that resolved deployment and cost bottlenecks.
  • Software Developer Intern, Incerro.ai(CodeBin)⁠ April 2022 - Sep 2022

  • Integrated frontend with backend AI and data services via REST APIs on an AI-powered e-commerce platform, implementing data validation and error handling to ensure end-to-end data integrity from pipeline to UI for SaaS and e-commerce clients.
  • Skills
    • Languages: Python, SQL (Window Functions, CTEs, Stored Procedures), C#, R, Bash
    • Data Engineering: ETL/​ELT, Dimensional Modeling, SCD Type 2, dbt, Airflow, Spark, Kafka, PySpark, Data Quality
    • Analytics & BI: Power BI, Tableau, Streamlit, Excel, SQL Reporting, KPI Development, Funnel/​Cohort Analysis
    • Platforms & Cloud: Snowflake, PostgreSQL, MySQL, AWS (S3, EC2, RDS, IAM), Azure, Docker, Terraform, CI/​CD
    • AI & Automation: LLM APIs (OpenAI, Anthropic, Gemini), LangChain, LangGraph, Multi-Agent Systems, PDF Parsing
    • Development & Tools: FastAPI, REST API Design, SQLAlchemy, Pydantic, .NET, Streamlit, Git, Jira, Agile/​Scrum
    Projects
    EV Charging Data Warehouse⁠, (Snowflake, dbt, Python, Airflow, Docker)
  • Architected star schema warehouse on Snowflake centralizing EV station, weather, and charging data from NREL and OpenWeather APIs via modular ETL pipeline with dbt transformations, automated data quality engine, and stakeholder documentation (ER diagrams, flow mappings, table dictionary).
  • Real‑Time User Analytics Pipeline⁠, (Kafka, PySpark, PostgreSQL, FastAPI, Streamlit, Docker
  • Engineered streaming pipeline (FastAPI, Kafka, Spark Structured Streaming, PostgreSQL) with windowed aggregations, watermarking, and Random Forest churn scoring, reducing detection latency from daily batch to sub-second.
  • Utility Billing AI Auditor⁠, (Python, Airflow, AWS, LLMs, Docker )
  • Architected multi-agent system automating PDF ingestion, LLM-based tariff extraction, and bill recalculation via Airflow and Docker, transforming 2–4 hour manual audits into minutes.