#72 Prepare for Databricks Data Engineer Associate certification exam part #1: Basic Terminology
5 min readOct 20, 2022
Before jumping right to actual preparation, let’s grab some of the most basic concepts and knowledge of Databricks!
Databricks Architecture and Services
Databricks architecture contains 2 main elements:
- Control plane: contains backend services that Databricks manages in its own cloud account. Majority of data DOES NOT reside here. Notebook commands and many other workspace configurations are stored in the control plane and encrypted at rest.
- Data plane: is where data is processed.
Clusters
A Databricks cluster is a set of computation resources and configurations on which you run data engineering, data science, and data analytics workloads, such as production ETL pipelines, streaming analytics, ad-hoc analytics, and machine learning.
Clusters are made up of 1 or more virtual machine (VM) instances. Driver coordinates activities of executors and executors run tasks composing a Spark job.