Getting started with Spark (part 2)

After discussing in part 1 all some knowledge on hardware, let’s move on to Hadoop.

Parallel computing

In general, it means that multiple CPUs share the same memory, while for distributed computing, each CPU has its own memory and is connected to other machines across a network.

Hadoop Vocabulary

  • Hadoop — an ecosystem of tools for big data storage and data analysis. Hadoop is an older system than Spark but is still used by many companies. The major difference between…

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store