What is the Difference between Apache Hadoop and Apache Spark – PART I
What is Apache Hadoop?
Hadoop is a frame work to convert UNSTRUCTURED DATA in to STRUCTURED DATA.
As per Cloudera – “Hadoop is an ecosystem of open source components that fundamentally changes the way enterprises store, process, and analyze data. Unlike traditional systems, Hadoop enables multiple types of analytic workloads to run on the same data, at the same time. CDH, Cloudera's open source platform, is the most popular distribution of Hadoop and related projects in the world (with support available via a Cloudera Enterprise subscription).”
What is Apache Spark?
Spark is a frame work to convert UNSTRUCTURED DATA in to STRUCTURED DATA with more focus on very high volume data and best possible performance measures.
As per Cloudera – “Hadoop is an ecosystem of open source components that fundamentally changes the way enterprises store, process, and analyze data. Unlike traditional systems, Hadoop enables multiple types of analytic workloads to run on the same data, at the same time. CDH, Cloudera's open source platform, is the most popular distribution of Hadoop and related projects in the world (with support available via a Cloudera Enterprise subscription).”