Resume

Pavan Badempet - Data Engineer

Data Engineer with 2+ years of experience building scalable ETL/ELT pipelines using Python, SQL, and Spark across AWS and on-prem environments. Solid grounding in computer science fundamentals, including data structures and algorithms, and distributed systems, applied to batch processing, data quality, and fault-tolerant pipeline design. Experience working on large-scale Spark ETL systems for FinTech and Automotive domains, with a focus on performance optimization and reliability.

What I Do
Data Pipeline Architecture
Designing and implementing scalable ETL/ELT workflows using Apache Spark, AWS Glue, and Lambda to process terabytes of data efficiently.
Cloud Data Platforms
Designing secure, serverless data lakes and warehouses on AWS (S3, Redshift, Athena) and Azure, ensuring high availability and cost optimization.
Big Data Processing
Optimizing complex SQL queries and Spark jobs for finding patterns in massive datasets, reducing processing time and improving data quality.
MLOps & AI Integration
Deploying machine learning models and GenAI agents into production environments using Docker, FastAPI, and CI/CD pipelines.
Experience
Nov 2023 – Present
Data Engineer - Tata Consultancy Services
  • Nomura Capital: Engineered and maintained large-scale Spark ETL pipelines using Spark SQL and complex analytical SQL queries for capital markets datasets (trades, reference, risk, and valuation feeds), involving multi-way joins and window functions to support downstream reporting and risk analytics.

  • Analyzed Spark execution metrics to improve execution time by 30% through broadcast joins, partition pruning, predicate pushdown, and optimized star schema joins between fact and dimension tables.

  • Orchestrated Spark workflows using AutoSys, managing dependency chains, reruns, and recovery logic, improving batch completion reliability and reducing manual intervention by 25%.

  • Executed migration of Spark workloads from YARN to Kubernetes, transitioning storage from HDFS to MinIO (S3-compatible), resolving Spark connector and storage compatibility issues.

  • Nissan: Architected daily batch pipelines using AWS Lambda and Step Functions, while developing a Streamlit interface for ad-hoc file ingestion and idempotent re-runs for business users.

  • Implemented schema validation, data quality checks, and incremental batch processing with idempotent re-runs, preparing curated datasets for Snowflake Data Warehouse ingestion and reducing manual effort by 60%.

  • Improved pipeline reliability via retries, implementing comprehensive monitoring and alerting using CloudWatch with automated notifications for pipeline failures, data freshness, and quality issues, reducing MTTR by 30%.

Mar 2023 – Apr 2023
Data Analytics Intern - TCS iON
  • Implemented an ensemble-based attrition prediction model, achieving 86% accuracy on validation data.

  • Designed a lightweight FastAPI service to expose the model for real-time inference and testing.

Education
Aug 2019 – Sep 2023
College: Guru Nanak Institutions Technical Campus

Degree: Bachelor of Technology in Computer Science & Engineering
Minor: AI/ML
Grade: 8.2 CGPA (First Class with Distinction)
Scholarships: Prime Minister’s Scholarship 2019-2023

Programming Languages
  • Python
  • SQL
  • Scala
  • Java
Data Processing
  • Apache Spark (PySpark, Spark SQL)
  • Pandas
  • NumPy
Cloud & Storage
  • AWS (Lambda, Glue, Step Functions, S3)
  • Snowflake
  • MinIO
Tools
  • AutoSys
  • Docker
  • Git
  • FastAPI
  • Streamlit
  • Pytest