Apache Spark

Apache spark is a cluster computing engine and its best fit for handling the iterative tasks

Apache Spark supports in memory data computation which means, the data will be moved to node’s memory, then the computation will be done. Because of this, Spark is more faster than Hadoop.

The fundamental data structure in spark is RDD. Resilent Distributed DataSet which is an immutable distributed collection of objects.

It does not have a specific storage as like Hadoop. So it may use any of the underlying data storage. It could be anything such as cloud storage, Hdfs, Nfs.

Spark is not a replacement for Hadoop but its a better replace for Hadoop Map reduce jobs.

Spark job is very easy to create compared to Hadoop Map reduce.

Spark streaming is a best alternative to Storm processing

MLib is also a best alternative to Mahout

