Vishal SinghSenior Software Engineer - Big Data
Profile

Big Data Engineer with 5+ years of experience in all phases of the software development life cycle. Passionate about Big Data and Machine Learning technologies and the delivery of effective solutions through creative problem-solving. Track record of building large scale systems using Big Data and Machine Learning technologies.

Technical Skills
Programming Languages

Python | SQL | Spark

AWS Services

S3 | EC2 | EMR | RDS | Redshift | Glue | CloudWatch | ECS

Databases

MySQL | MongoDB | Cassandra | HBase

Azure Services

Data Factory | Databricks | Functions | Blob | Synapse | Delta Lake

Distributed Framework

Spark | Hadoop | Hive | Kafka | Sqoop

ML Frameworks

Pandas | Numpy | Sklearn | PySpark | Pytorch |

Matplotlib | Seaborn | TFX

MLOps

Docker | Docker Compose | GitHub Actions | MLflow | Git | DVC | Airflow

GCP Services

Cloud Storage | Compute Engine | Dataproc | BigQuery | Dataflow | GKE | AlloyDB

Work Experience
  • Enhanced system efficiency by reducing the pipeline runtime from 20 days to 3 days by implementing caching, analyzing data columns.
  • Implemented health monitoring & custom metrics collection on all the production servers using Prometheus, Node Exporter and Grafana.
  • Automated journal data creation and deployed the Tableau reports seamlessly
  • Currently leading a team of 4 developers for redesigning the existing architecture and improving tech stack by integrating PySpark, MySQL and ElasticSearch, LogStash, Kibana (ELK) stack
  • 01/2022 – 12/2023
  • Implemented ETL and data processing pipeline using PySpark on batch and streaming data.
  • Designed, discussed, and implemented machine learning pipelines with MLOps practices.
  • Implemented CI/CD pipeline using GitHub actions for Azure and AWS cloud.
  • Gave expert lectures on Machine Learning, Big Data, and MLOps in batches to 1000+ Students.
  • Created a user-friendly web app for small businesses unable to hire a Data Analyst/Scientist, enabling them to upload data, perform Exploratory Data Analysis (EDA), Data Preprocessing, Feature Engineering, and train Machine Learning models with ease. The app simplifies complex tasks, allowing users to download all necessary binary files as a zip for predictions and future use.
  • Took live doubt sessions and managed support team for data science batch having around 500 students.
  • Created a fully functional responsive job portal website where an HR manager can post any job for their company. They can monitor their candidate, send tasks to them, and hire a candidate.
  • A candidate can see all kinds of jobs from various kinds of companies. They can apply and get a response by mail. They can see tasks from different companies and can submit the task by fulfilling them.
  • Projects
    Financial Product Service
    05/2023 – 12/2023

    Categorization of financial product and service complaints registered by consumers.

    Tech: Python, PySpark, Grafana, Prometheus, AWS, Azure

  • Got weekly data from web API and used S3 Bucket as feature store.
  • Used PySpark for data transformation and model training.
  • Followed multi-cloud strategy as model training is done on Azure and prediction on AWS.
  • Prometheus & Grafana is used for monitoring and visualization.
  • Scheduled pipeline using Airflow for continuous training.
  • Data Warehousing Solution
    10/2022 – 03/2023

    Tech Stack: Apache Airflow, PySpark, Apache Kafka, Amazon S3, AWS Glue, Amazon Redshift

  • Developed a PySpark publisher to stream data from MySQL into Kafka topics for real-time ingestion.
  • Built a PySpark consumer to land Kafka messages into Amazon S3 in raw format for scalable storage.
  • Leveraged AWS Glue to catalog the S3 data, handle schema evolution, and perform transformations (PySpark/Glue jobs).
  • Orchestrated Glue jobs with Airflow to process curated data and prepare it for Redshift loading.
  • Implemented automated Glue → Redshift COPY operations to dump partitioned files from S3 into Redshift tables for analytics.
  • Education
    MBA, IGNOU
    06/2023 – present
    MCA, IGNOU
    12/2018 – 12/2020
    BCA, IGNOU
    12/2015 – 12/2018