AI Engineer specializing in multimodal agentic systems and scalable intelligent architectures. Experienced in fine-tuning LLMs, orchestrating autonomous workflows, and deploying production-grade AI applications. Passionate about shifting software from data processing to genuine cognitive agency.
Orchestrating the development of production-grade multimodal applications and autonomous agents. Engineering fine-tuning pipelines and complex AI workflows to shift systems from static processing to dynamic cognitive agency. Implementing rigorous evaluation frameworks to benchmark and align model performance with product objectives.
Automated core operational workflows by engineering research-driven prototypes using multimodal and large language models. Optimized model outputs through advanced prompt engineering and the design of vision-language processing pipelines, significantly enhancing organizational efficiency.
Generated high-complexity code evaluation datasets to align large language models (RLHF). Audited model outputs for logical correctness and security vulnerabilities.
Built a semantic search engine enabling natural language queries over video archives (e.g., 'find the
moment he talks about scale'). Engineered a modular pipeline using FFmpeg for frame extraction and Gemini for analysis and Open-source models multimodal embeddings. Migrated vector storage to ChromaDB for local-first retrieval, reducing content lookup time by ~90%.
Tech: Python, Gemini API, ChromaDB, FFmpeg, React.
Architected a multimodal autonomous agent capable of perceiving and controlling desktop interfaces to execute user-defined workflows. Integrated Gemini Vision for real-time screen understanding and PyAutoGUI for action execution. Exposed the agent via a FastAPI endpoint to serve as an automated end-to-end UI tester.
Tech: Google AI SDK, PyAutoGUI, EasyOCR, FastAPI.
Developed and documented a 'Neural Network from Scratch' project to deepen understanding of machine learning algorithms. The repository contains .ipynb and Python files with detailed study notes. Reviewed and synthesized insights from key research papers.
Tech: Python, NumPy, Jupyter, Pytorch.
Designed a robust pipeline to generate high-quality mcqs using gemini-2.5-flash. Used the same model to evaluate the quality of generations according to custom rubrics. Reduced processing time and increased structured data reliability to 99%+.
Tech: GoogleAI SDK, PostgreSQL, Pandas
This project demonstrates how to build a multi-agent system for automated incident response using Google's Agent Development Kit (ADK). It simulates a team of agents that work together to diagnose the root cause of a (mock) technical incident by analyzing logs, metrics, and recent code changes.
Tech: Python, Google ADK, Multi-Agent Systems.
Developed a local-first video analysis platform for automated interview coaching. Orchestrated a FastAPI backend to process user-uploaded videos, utilizing Gemini models to generate rubric-based scores and executive summaries. Built a reactive interface with React/Vite to visualize performance metrics and manage custom evaluation criteria.
Tech: FastAPI, React, Gemini, FFmpeg.
Pytorch, Hugging face
Python, C
Pandas, Numpy, Matplotlib, Flask
Git, Github
Relevant coursework: Digital Signal Processing, Computer Networks, Analog
and Digital Circuits, Operating Systems, Embedded systems, Artificial
Intelligence, and Machine Learning.
Fluent English, Hindi and Kannada.
I love stumbling upon topics that let me dive into deep rabbit holes, Weight training, books and Football.