
Experienced Big Data Engineer with exposure to design and architecture of data pipelines. High quality software engineering practitioner. Several years of experience in working on highly scalable systems using technology stack such as Apache Spark, Kafka, SQL and NoSQL databases such as Cassandra.
Design and develop data pipelines to ingest and transform business data to build KPI's
Design, build and deploy data pipelines to curate and transform data received from clients and create data marts and data ready for consumption to build KPIs.
Anomaly Detection with LSTM using Keras & BigDL
Implemented a scalable analytics platform for collecting, reporting monitoring energy consumption data from meters across the buildings.
Contributed to the development and implementation of Secure Terrain platform that helps reveal potential risks in IT and business.
Terrain Intelligence aggregates and correlates threat information from multiple sources to help analysts confidently learn about and understand threats.
Worked as Development Lead for in-house ERP project.
An anomaly detection and alerting system using Deep learning LSTM architecture.
Technologies used:
Apache Spark, BigDL, Analytics-Zoo, Cassandra , AWS Cloud
Scala, Python
Main project features:
Build and train Time series model using Keras LSTM network, Generate prediction on newly arrived data and generate alerts for anomalies
Solution is built is open-source technologies & hosted on Amazon Web Services
Activities performed:
Requirement analysis & build POC on Google Colab.
Analyze sample raw-data from devices and build data models. .
A big data analytics platform to collect energy consumption data from various devices such as energy metres, BMS systems etc, stores time series data in distributed NoSQL database. System predefined as well as user defined analytics backed by Apache Spark processing engine.
Main project features :
Gather, parse, analyze data from BMS devices and Lighting managers and other IOT devices
Solution is built on big-data technologies & hosted on AWS cloud.
Activities performed :
Full responsibility for the definition, documentation and successful completion of project. Requirement analysis & build POC. Analyze raw-data from devices and build data models. Build data processing pipelines using Spark, Cassandra, RDS, Redshift, Glue.
Functional Specification Analysis and preparation of Design Document
Database Development & Design using SQL, PLSQL
Client Support and trouble-shooting, Bug fixing
Database Development & Design
Front End Application Development(Using RAD Tool, Java Script)
PwC’s Secure Terrain™ is a cloud-based model for real-time threat analysis, detection and remediation.
PwC’s Secure Terrain solution, powered by the Google Cloud Platform, provides businesses with the capabilities to gain a holistic view of their entire cybersecurity landscape, enabling them to strategically manage cybersecurity risks and protect critical assets.
Main project features
Gather, parse, analyze log files from security systems and firewalls
Solution is built is open-source technologies & hosted on Google-Cloud-Platform. Used Ansible, Git, Jenkins, JIRA, Confluence for CI/CD
Activities performed
Requirement analysis & build POC. Analyze log-files and build data models. Build data processing pipelines using Spark-Kafka-ES. Scaling up application to process terabytes of data & cost optimization
Main project features
Graph database to store and retrieve threat-intelligence
Solution is built is open-source technologies & hosted on Google-Cloud-Platform. Used best dev-ops tools such as Ansible, Git, Jenkins, JIRA, Confluence for CI/CD
Activities performed
Used Graph database TITAN to store entity-relations and to do traversals using Gremlin. Built REST API’s using Python-DRF to fetch an entity and its relations. Integrated the TI with Analytics solutions for auto-enrich and alerting based on user data.