Apache Spark

Apache spark is a cluster computing engine and its best fit for handling the iterative tasks

Apache Spark supports in memory data computation which means, the data will be moved to node’s memory, then the computation will be done. Because of this, Spark is more faster than Hadoop.

The fundamental data structure in spark is RDD. Resilent Distributed DataSet which is an immutable distributed collection of objects.

It does not have a specific storage as like Hadoop. So it may use any of the underlying data storage. It could be anything such as cloud storage, Hdfs, Nfs.

Spark is not a replacement for Hadoop but its a better replace for Hadoop Map reduce jobs.

Spark job is very easy to create compared to Hadoop Map reduce.

Spark streaming is a best alternative to Storm processing

MLib is also a best alternative to Mahout

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s