This Data Engineering with Databricks training course provides an overview of data architecture concepts, an introduction to the Lakehouse paradigm, and an in-depth look at Delta Lake features and functionality. You will learn about applying software engineering principles with Databricks as you build end-to-end OLAP data pipelines using Delta Lake for batch and streaming data. Considerations around normalization, change data capture, slowly changing dimensions, and regulatory compliance will be explored. The course also discusses serving data to end users through aggregate tables and Redash. Throughout the course, emphasis will be placed on using data engineering best practices with Databricks.
By attending Data Engineering with Databricks workshop, delegates will learn to:
- Build an end-to-end batch and streaming OLAP data pipeline using the Databricks Workspace.
- Make data available for consumption by downstream stakeholders using specified design patterns
- Apply Databricks’ recommended best practices in engineering a single source of truth Delta architecture.
- Intermediate to advanced programming skills in Python
- Intermediate to advanced SQL skills
- Beginner experience using the Spark DataFrames API
- Knowledge of general data engineering concepts
- Knowledge of the core features and use cases of Delta Lake
The Data Engineering with Databricks class is ideal for:
- Data Engineers and Machine Learning Engineers