Dirghraj Kushawaha — Senior Data Engineer | Databricks | PySpark | Delta Lake | AWS
Summary
Senior Data Engineer with 5+ years of experience building scalable batch and real-time data platforms using Databricks, PySpark, Delta Lake, SQL, AWS, and Redshift. Strong in designing production-grade streaming pipelines, Delta table optimization frameworks, data quality improvements, and analytics-ready views for enterprise use cases.
Skills
- Languages: Python, PySpark, SQL
- Big Data & Processing: Apache Spark, Databricks, Structured Streaming, Delta Lake, Delta Live Tables, EMR
- Cloud & Storage: AWS S3, AWS Glue, AWS Redshift, Kinesis, Kafka-style streaming ingestion
- Data Engineering: Batch pipelines, real-time pipelines, CDC, SCD, merge/upsert logic, partitioning, file compaction, shuffle optimization
- Delta Lake & Governance: Unity Catalog, external Delta tables, OPTIMIZE, VACUUM, retention configuration, Delta observability
- Databases & Warehouses: Redshift, Snowflake, SQL Server, SQLite
- Workflow & Platform: Airflow/MWAA, ADF, config-driven frameworks, pipeline monitoring, metadata-driven ingestion
- Domains: Aviation analytics, onboard Wi-Fi telemetry, Starlink analytics, ServiceNow/ITSM data, candidate matching and AI automation
Experience
United Airlines / UDH-CBS Digital — Senior / Lead Data Engineer (Dates to confirm)
- Built and enhanced onboard Wi-Fi and Starlink analytics pipelines supporting real-time and batch processing across raw, curated, and reporting layers.
- Reworked flight boarding-time logic by adding additional source systems including `aci_pax_ckin`, `wifi_flight_event`, and `aci_flt_leg`, improving 90-minute boarding-time alignment from roughly 44–46% to 88–95%.
- Designed a config-driven Delta Lake maintenance framework for external Delta tables, supporting scheduled `OPTIMIZE` and `VACUUM` operations with daily, weekly, monthly, and specific-date execution patterns.
- Supported migration from Delta Live Tables to Structured Streaming for real-time pipelines, improving flexibility for external Delta tables on S3 and future interoperability patterns.
- Added merge/upsert capability to real-time S3 write pipelines using configurable controls such as `enable_merge`, `merge_days_back`, and `merge_date_column`.
- Improved player/PDE event metric accuracy by fixing deduplication logic to include `ops_flight_id`, preventing incorrect drops across flights and improving reliability of `playback_attempt_no` reporting.
- Created and updated analytics views such as flight report history, in-coverage reporting, Google heartbeat connectivity, and consolidated flight-level reporting views.
- Worked on ServiceNow, ICON, Jira, and asset-management data pipelines involving Redshift, Databricks, S3 Parquet loads, deduplication, grants, and reporting views.
- Led vendor/project delivery discussions, reviewed implementation quality, clarified requirements, and translated business issues into technical execution plans.
Education
- B.Tech in Information Technology, Rajasthan Technical University (2020)
Highlights
- Designed Delta Lake maintenance and observability patterns for production-scale Databricks tables.
- Built data engineering solutions across streaming, batch, analytics, and operational reporting use cases.
- Hands-on with Databricks, PySpark, Delta Lake, AWS, Redshift, SQL, and enterprise data platform design.
- Explored AI-enabled recruiter scoring, LinkedIn automation, candidate matching, and agentic workflow design.