Spark

Spark is built after MapReduce. It has several advantages.

step-in-spark

Spark's core is RDD (Resilient Distributed Dataset).

RDD has 2 kinds of functions:

action: count, saveAsTextFile, etc
transformation: map, filter, reduceByKey

Transformation can be classified by whether applying shuffle. map doesn't shuffle, therefore it doesn't create new RDD. It's fast. However, reduceByKey needs shuffling. It creates new RDD, and it costs time.