Venkatesh Reddy SureData Engineer | BI Developer
Profile

Data Engineer with 5 years of experience in Cloud Data Engineering, ETL Development, testing and Data Analytics. Proficient in developing distributed Data Applications and BI solutions using Spark and MSBI tech stack. Experienced in architecting ETL workloads using AWS and Azure technologies.

Professional Experience
TD Bank, Data Engineer
Jan 2023 – May 2024 | Toronto
  • Developed Spark Applications in Azure Databricks to extract, transform and aggregate the data and create the daily and monthly extracts for reporting and analytics.
  • Applied database design principles to create efficient data models, optimize queries, manage indexes, and perform integrity checks.
  • Led data migration efforts using Azure Data Factory, Databricks, and ADLS Gen 2 by developing PySpark scripts for loading data into SQL Pools and Azure Delta Lake.
  • Built, orchestrated and optimized ADF data pipelines to extract, transform, and load (ETL/ELT) API, relational, and streaming sources into the Synapse data warehouse.
  • Deployed AMD Databricks job clusters to share the resources among different tenants considering the cost, computational requirement and workloads SLA.
  • Automated data profiling and cleaning using Spark DataFrames, Spark SQL, and PySpark reducing validation time by 30% and boosting productivity.
  • Collaborated with stakeholders to understand business requirements and translate them into technical solutions following Agile/Scrum methodologies.
  • Trained business users and developers regarding ETL Framework, Analytics and Synapse query optimization techniques to migrate the Tableau dashboards and ETL workloads.
  • Scheduled the dedicated Synapse's Datawarehouse units (DWUs) based on the Ad-hoc and ETL workloads resulting in 28% savings in data warehousing cost.
  • Developed, tested and deployed historical data movement pipeline and migrated the cusomer operations data to synapse data warehouse.
  • Automated the process of transferring the flat file types from Azure SQL Pool or ADLS Gen 2, to EBX TIBCO mailbox.
  • Cyient, BI Developer
    Oct 2017 – Mar 2021 | India
  • Designed, developed, tested, and maintained scalable ETL and BI solutions involving data warehousing, integration, migration, conversion, and analytics.
  • Designed automated interactive PowerBI and Tableau dashboards with KPI scorecards for non-technical users that improved decision-making efficiency by 30%.
  • Hands-on ETL development experience with Python, SQL, DBT and AWS services like Glue, lambda, Redshift, Athena, S3, Step Function, Monitoring and Logging mechanisms- AWS Cloud Watch.
  • Scheduled and managed batch data processing jobs using Apache Airflow optimizing resource usage and processing time.
  • Defined and implemented SSIS packages for daily data filtering and import from the OLTP system to SQL Server, incorporating advanced features like Lookup Transformations, Merge Joins, Fuzzy Lookups, and Derived Columns.
  • Proficient in designing fact and dimension tables, and establishing relationships between tables using Star Schema and Snowflake Schema.
  • Developed etl jobs to feed data to Supply chain analytics dashboard and datawarehousing using Azure Data Factory, Synapse, Data lake and Databricks.
  • Deployed and managed containerized microservice applications using Docker and Kubernetes, ensuring consistent environments across development, testing, and production stages.
  • Authored SQL Stored Procedures to implement SCD Type2 functionality which captures history per each batch run on ETL control.
  • Developed advanced analytics solutions by building and optimizing data pipelines and e-commerce datasets to process inventory, procurement, and logistics data, leading to improved demand forecasting, and reduced stockouts.
  • Optimized SQL queries and performed SQL Server tuning to enhance query performance and reduce execution time for business applications.
  • Created DDL, DML scripts, stored procedures, User Defined Functions (UDF), Common Table Expression (CTE), Views, Joins, and System Defined Functions, as well as Temp Tables and Table Variables, according to the requirement and designed ad-hoc queries on the database tables.
  • Skills
    Programming Languages: PySpark | Java | Python | DAX | T-SQL  | VBA | Shell Scripting
    Cloud: Azure Data bricks | Azure Data Factory | Azure Synapse Analytics | Azure DevOps | Snowflake | Microsoft Entra ID | Data Lake GEN 2 | AWS(EC2, S3, RDS, Glue, Lambda, Athena, Glue Crawlers, DynamoDB) | Airflow | Event Hub | Event Grid | Datadog
    Databases: SQL SERVER 2019/2016 | PostgreSQL | MySQL | Azure SQL | MongoDB
    Other tools and Skills: SQL | NoSQL | Jenkins | Data Modeling | Apache Spark | Kafka | SSIS | ELT | Gitlab | Business Intelligence | Communication | Power BI | Tableau | Database optimization
    Education
    Concordia University, Master of Engineering
    Jan 2021 – Oct 2022 | Montreal