Clusters
Let's understand how Spark works.
-
We interact with Driver program, which create the
SparkContext
. -
SparkContext
connects to a cluster manager, e.g., Mesos/YARN, which allocate the resources. -
Spark acquires the nodes in the cluster. Each node has an executor to run the code and save the data.
-
Driver program sends your codes to the executors.
-
SparkContext
sends tasks for executors to run.
Driver program
|
Cluster manager
|
| -- | -- | -- | ...
node node node ...
Transaction is executed on Driver program, while action is directly run on executor.