Akhilesh Pratap ShahiData Engineer
Email
[email protected]
Phone
+91-844-7020-911
akhileshshahi
GitHub
shahiakhilesh1304
Location
Bangalore, India
Visa Status
Open For Relocation (US, Europe, Asia)
Profile

Results driven Data Platform Engineer with 6+ years of experience in building, optimizing, and operating large-scale distributed data systems. Strong expertise in Hadoop ecosystem (HDFS, YARN), cluster management, performance tuning, and system-level troubleshooting. Proven track record in managing high-throughput data platforms (1.6B+ records/​day), improving cluster efficiency, and driving platform reliability across telecom and healthcare domains.

Skills
Programming & Scripting

Python, Java, Scala, Shell Scripting, J2EE, HTML5, JavaScript, Bootstrap

Big Data & Distributed System

Apache Spark, PySpark, Apache Kafka, Apache Hadoop, HDFS, YARN, Hadoop Cluster Management, Resource Allocation, Capacity Planning, Cluster Optimization, Platform Engineering

Data Engineering & ETL

ETL/​ELT Pipelines, Data Modeling (Dimensional, Relational), Snowflake Schema, ER Diagrams, Data Processing (Wrangling, Transformation, Aggregation), Data Annotation, Data Retention, Data Backup

Cloud & Tools

Azure Databricks, Azure Delta Lake, Azure Data Factory, Azure Blob Storage

Data Analysis & Visualization

Pandas, NumPy, SciPy, Matplotlib, Seaborn, Tableau

Databases & Warehousing

MongoDB, MySQL, Google BigQuery, Apache Druid, SQL, NoSQL, Data Warehousing, OLAP/​OLTP, Data Governance, Data Quality

DevOps & Containers

Docker, Kubernetes (Basics), Git, GitHub

Orchestration & Workflow Tools

Apache Airflow, CI/​CD Basics

System & Infrastructure

Linux, Distributed Systems, System-Level Troubleshooting

Performance & Observability

Performance & Query Optimization, Caching Strategies, Monitoring, Alerting & Observability (Logs, Metrics, APIs, Cluster Monitoring)

Professional Experience
03/2023 – presentBanglore, India

Reliance Jio Infocomm Pvt Ltd

Big Data Engineer
  • Managed and optimized large-scale Hadoop/Spark clusters processing 1.6B+ records/day, ensuring efficient YARN resource allocation and HDFS utilization
  • Performed cluster-level performance tuning, reducing execution latency and improving workload distribution across nodes
  • Led system-level troubleshooting for production failures, including memory bottlenecks, executor failures, and data skew issues
  • Built internal frameworks to monitor cluster health, resource contention, and job-level performance metrics
  • Designed APIs that interact with distributed systems, reducing data retrieval time by 90% while maintaining cluster stability
  • Improved platform reliability and fault tolerance by optimizing job retries, partitioning strategies, and storage access patterns
  • Actively worked on Linux-based debugging, including log tracing, disk usage issues, and process-level analysis across nodes
  • Collaborated with infra teams to enhance cluster scalability and workload isolation strategies
  • 10/2022 – 02/2023Gurgaon, India

    MpHrx

    Software Engineer
  • Built and optimized Apache Spark (PySpark) pipelines on Azure Databricks for high-volume healthcare data processing
  • Improved Spark job performance through partition tuning, caching strategies, and execution plan optimization
  • Contributed to data platform stability by identifying bottlenecks in distributed processing and improving job reliability
  • Enhanced data ingestion workflows ensuring efficient resource usage and reduced cluster strain
  • Worked on query optimization and system performance improvements, reducing latency by 50%
  • 03/2022 – 07/2022Gurgaon, India

    SAR GROUP(Lectrix E-Vehicle)

    Software Developer
  • Designed and maintained backend services supporting high-volume telemetry data ingestion, ensuring stable data flow into downstream systems
  • Worked on system performance optimization, reducing API response latency by 40% under production load
  • Implemented efficient data models in MongoDB to handle large-scale, semi-structured data with high write throughput
  • Performed system-level debugging and performance tuning to resolve bottlenecks in API and database layers
  • Contributed to improving service reliability and fault tolerance through better exception handling and logging mechanisms
  • Worked closely with infrastructure teams to ensure scalability and efficient resource utilization
  • 01/2020 – 03/2022Lucknow, India

    Fintree Global Research

    Software Developer (Co-Founder)
  • Built and managed scalable backend systems handling financial data ingestion, processing, and analytics workflows
  • Designed system architecture focusing on performance, reliability, and extensibility for growing user demand
  • Optimized database queries and backend services, improving overall system performance by 28%
  • Led development of data-driven features requiring efficient handling of large datasets and real-time processing needs
  • Implemented structured logging and monitoring practices to support system observability and debugging
  • Took ownership of end-to-end platform development, including deployment, performance tuning, and system stability
  • Education
    08/2024 – presentValletta, Malta

    Woolf University

    M.S. in Computer Science

    Specialized in ML/​AI

    08/2015 – 04/2019Ambala, India

    Maharishi Markandeshwar (Deemed to be University)

    B.Tech

    Major in Computer Science

    Certificates
    Recognition and Achievements
  • Guest Lecturer – Python, BSA Engineering College
  • Former Member – Computer Society of India
  • Winner – Hackathons, Mono Acts, and Inter-college Theater Events
  • Vice President – Trojan Society | President – Pratibimb Theatre Club