Course Outline

Introduction, Objectives, and Migration Strategy

  • Course goals, participant profile alignment, and success criteria
  • High-level migration approaches and risk considerations
  • Setting up workspaces, repositories, and lab datasets

Day 1 — Migration Fundamentals and Architecture

  • Lakehouse concepts, Delta Lake overview, and Databricks architecture
  • SMP vs MPP differences and implications for migration
  • Medallion (Bronze→Silver→Gold) design and Unity Catalog overview

Day 1 Lab — Translating a Stored Procedure

  • Hands-on migration of a sample stored procedure to a notebook
  • Mapping temp tables and cursors to DataFrame transformations
  • Validation and comparison with original output

Day 2 — Advanced Delta Lake & Incremental Loading

  • ACID transactions, commit logs, versioning, and time travel
  • Auto Loader, MERGE INTO patterns, upserts, and schema evolution
  • OPTIMIZE, VACUUM, Z-ORDER, partitioning, and storage tuning

Day 2 Lab — Incremental Ingestion & Optimization

  • Implementing Auto Loader ingestion and MERGE workflows
  • Applying OPTIMIZE, Z-ORDER, and VACUUM; validating results
  • Measuring read/write performance improvements

Day 3 — SQL in Databricks, Performance & Debugging

  • Analytical SQL features: window functions, higher-order functions, JSON/array handling
  • Reading the Spark UI, DAGs, shuffles, stages, tasks, and bottleneck diagnosis
  • Query tuning patterns: broadcast joins, hints, caching, and spill reduction

Day 3 Lab — SQL Refactoring & Performance Tuning

  • Refactor a heavy SQL process into optimized Spark SQL
  • Use Spark UI traces to identify and fix skew and shuffle issues
  • Benchmark before/after and document tuning steps

Day 4 — Tactical PySpark: Replacing Procedural Logic

  • Spark execution model: driver, executors, lazy evaluation, and partitioning strategies
  • Transforming loops and cursors into vectorized DataFrame operations
  • Modularization, UDFs/pandas UDFs, widgets, and reusable libraries

Day 4 Lab — Refactoring Procedural Scripts

  • Refactor a procedural ETL script into modular PySpark notebooks
  • Introduce parametrization, unit-style tests, and reusable functions
  • Code review and best-practice checklist application

Day 5 — Orchestration, End-to-End Pipeline & Best Practices

  • Databricks Workflows: job design, task dependencies, triggers, and error handling
  • Designing incremental Medallion pipelines with quality rules and schema validation
  • Integration with Git (GitHub/Azure DevOps), CI, and testing strategies for PySpark logic

Day 5 Lab — Build a Complete End-to-End Pipeline

  • Assemble Bronze→Silver→Gold pipeline orchestrated with Workflows
  • Implement logging, auditing, retries, and automated validations
  • Run full pipeline, validate outputs, and prepare deployment notes

Operationalization, Governance, and Production Readiness

  • Unity Catalog governance, lineage, and access controls best practices
  • Cost, cluster sizing, autoscaling, and job concurrency patterns
  • Deployment checklists, rollback strategies, and runbook creation

Final Review, Knowledge Transfer, and Next Steps

  • Participant presentations of migration work and lessons learned
  • Gap analysis, recommended follow-up activities, and training materials handoff
  • References, further learning paths, and support options

Requirements

  • An understanding of data engineering concepts
  • Experience with SQL and stored procedures (Synapse / SQL Server)
  • Familiarity with ETL orchestration concepts (ADF or similar)

Audience

  • Technology managers with a data engineering background
  • Data engineers transitioning procedural OLAP logic to Lakehouse patterns
  • Platform engineers responsible for Databricks adoption
 35 Hours

Delivery Options

Private Group Training

Our identity is rooted in delivering exactly what our clients need.

  • Pre-course call with your trainer
  • Customisation of the learning experience to achieve your goals -
    • Bespoke outlines
    • Practical hands-on exercises containing data / scenarios recognisable to the learners
  • Training scheduled on a date of your choice
  • Delivered online, onsite/classroom or hybrid by experts sharing real world experience

Private Group Prices RRP from €11400 online delivery, based on a group of 2 delegates, €3600 per additional delegate (excludes any certification / exam costs). We recommend a maximum group size of 12 for most learning events.

Contact us for an exact quote and to hear our latest promotions


Public Training

Please see our public courses

Provisonal Upcoming Courses (Contact Us For More Information)

Related Categories