Dirghraj Kushawaha — Senior Data Engineer | Databricks | PySpark | Delta Lake | AWS

Summary

Senior Data Engineer with 5+ years of experience building scalable batch and real-time data platforms using Databricks, PySpark, Delta Lake, SQL, AWS, and Redshift. Strong in designing production-grade streaming pipelines, Delta table optimization frameworks, data quality improvements, and analytics-ready views for enterprise use cases.

Skills

Languages: Python, PySpark, SQL
Big Data & Processing: Apache Spark, Databricks, Structured Streaming, Delta Lake, Delta Live Tables, EMR
Cloud & Storage: AWS S3, AWS Glue, AWS Redshift, Kinesis, Kafka-style streaming ingestion
Data Engineering: Batch pipelines, real-time pipelines, CDC, SCD, merge/upsert logic, partitioning, file compaction, shuffle optimization
Delta Lake & Governance: Unity Catalog, external Delta tables, OPTIMIZE, VACUUM, retention configuration, Delta observability
Databases & Warehouses: Redshift, Snowflake, SQL Server, SQLite
Workflow & Platform: Airflow/MWAA, ADF, config-driven frameworks, pipeline monitoring, metadata-driven ingestion
Domains: Aviation analytics, onboard Wi-Fi telemetry, Starlink analytics, ServiceNow/ITSM data, candidate matching and AI automation

Experience

United Airlines / UDH-CBS Digital — Senior / Lead Data Engineer (Dates to confirm)

Built and enhanced onboard Wi-Fi and Starlink analytics pipelines supporting real-time and batch processing across raw, curated, and reporting layers.
Reworked flight boarding-time logic by adding additional source systems including `aci_pax_ckin`, `wifi_flight_event`, and `aci_flt_leg`, improving 90-minute boarding-time alignment from roughly 44–46% to 88–95%.
Designed a config-driven Delta Lake maintenance framework for external Delta tables, supporting scheduled `OPTIMIZE` and `VACUUM` operations with daily, weekly, monthly, and specific-date execution patterns.
Supported migration from Delta Live Tables to Structured Streaming for real-time pipelines, improving flexibility for external Delta tables on S3 and future interoperability patterns.
Added merge/upsert capability to real-time S3 write pipelines using configurable controls such as `enable_merge`, `merge_days_back`, and `merge_date_column`.
Improved player/PDE event metric accuracy by fixing deduplication logic to include `ops_flight_id`, preventing incorrect drops across flights and improving reliability of `playback_attempt_no` reporting.
Created and updated analytics views such as flight report history, in-coverage reporting, Google heartbeat connectivity, and consolidated flight-level reporting views.
Worked on ServiceNow, ICON, Jira, and asset-management data pipelines involving Redshift, Databricks, S3 Parquet loads, deduplication, grants, and reporting views.
Led vendor/project delivery discussions, reviewed implementation quality, clarified requirements, and translated business issues into technical execution plans.

Education

B.Tech in Information Technology, Rajasthan Technical University (2020)

Highlights

Designed Delta Lake maintenance and observability patterns for production-scale Databricks tables.
Built data engineering solutions across streaming, batch, analytics, and operational reporting use cases.
Hands-on with Databricks, PySpark, Delta Lake, AWS, Redshift, SQL, and enterprise data platform design.
Explored AI-enabled recruiter scoring, LinkedIn automation, candidate matching, and agentic workflow design.