FlowCV Logo
Vamsi Krishna JakkaData Engineer
Profile

Data Engineer with 3+ years of experience in designing and optimizing ETL pipelines and data workflows. Proficient in Python, SQL, and AWS technologies (Snowflake, Matillion, Airflow), and Tableau, with a proven ability to transform and integrate large datasets to drive actionable insights and support business growth.

Certification
Professional Experience
Senior Data Engineer, Coforge
  • Architected and optimized automated ETL pipelines for data ingestion, processing, and distribution, significantly enhancing data quality and accessibility.
  • Orchestrated data ingestion from multiple sources, including ServiceNow via AppFlow, Kinesis, and direct pulls into S3 landing (bronze layer) in JSON format.
  • March 2023 – presentHyderabad
  • Leveraged AWS Glue to transform data, implement schema modifications, and cleanse datasets, storing results in a silver layer S3 bucket in Parquet format.
  • Executed additional Glue jobs to derive client-specific metrics, storing the final processed data in the golden layer S3 bucket in Parquet format.
  • Employed Amazon Athena for efficient querying of data stored in the golden layer.
  • Tech Stack: AWS (Glue, S3, Lambda, AppFlow, Step Functions, Athena), Kinesis, CloudFormation, Python

    Data Engineer, Coforge
  • Leveraged sensors to gather vehicle performance data, ensuring accurate monitoring and analysis for enhancements.
  • Oversaw data migration from various testing machines to S3 using AWS Data Migration Service.
  • July 2021 – March 2023Hyderabad
  • Employed AWS Glue for ETL processes, refining and transforming raw datasets to meet business requirements and storing them in a curated S3 bucket.
  • Developed and sustained a data catalog using AWS Glue Crawlers to enhance data discovery and ensure consistency.
  • Ingested curated datasets from S3 into Amazon Redshift for advanced querying and analysis.
  • Monitored and optimized AWS Glue job configurations, troubleshooting issues and analyzing logs for operational efficiency.
  • Tech Stack: Python 3. x, PySpark, SQL, AWS (Lambda, Glue, Redshift, S3), GitHub, JIRA

    Technical Skills
    Programming Languages

    Python | SQL | Spark

    AWS Services

    S3 | RDS | Redshift | Glue | Cloud Formation Template | Lambda | AppFlow | Step Functions

    Databases

    MySQL | MongoDB | Cassandra | HBase

    Version Control & Scheduling

    Git | Airflow

    Distributed Framework

    Spark | Hadoop | Hive | Kafka | Sqoop

    MLOps and ML Framework

    Docker | Pyspark

    ETL Tools

    Snowflake | Matillion

    Projects
    Real-Time Data Processing and Visualization Workflow using Matillion

    Designed and implemented a data pipeline using Matillion to load and transform data from AWS S3 into Snowflake, ensuring data integrity and enabling advanced analytics and visualization in Power BI.

  • Engineered a data pipeline using Matillion to load and transform datasets from AWS S3 into Snowflake for seamless integration and analysis.
  • October 2023 – January 2024
  • Processed and optimized datasets in Snowflake with Matillion, ensuring data integrity and enabling complex analytical workflows.
  • Developed interactive visualizations in Power BI to extract actionable insights from Snowflake data, empowering stakeholders to make informed, data-driven decisions.
  • Tech Stack: Matillion, AWS S3, Snowflake, Power BI

    Data Warehousing Solution

    Designed and developed an ETL pipeline to export data from the MySQL transaction database to AWS Redshift for data analysis.

  • Engineered a publisher using PySpark to transmit data from MySQL to Kafka topics.
  • August 2021 – December 2021
  • Developed a PySpark consumer to write data to an S3 bucket.
  • Scheduled PySpark jobs to transfer files from the S3 bucket to Redshift tables.
  • Tech Stack: Apache Airflow, PySpark, Amazon Redshift, S3, Apache Kafka

    Education
    Bachelor of Technology, Sri Mittapalli College of Engineering
    July 2017 – July 2021