#61 Big data technology(part 1): The very basics of Hadoop

Hang Nguyen
4 min readJun 28, 2022

Big data

Big data refers to data sets that are too large or complex to be dealt with by traditional data-processing application software. It includes structured data (highly organized and easily understood by machine language), unstructured data (text and multimedia for example) and semistructured data (photos and videos for instance).

Life-cycle of big data

Life cycle of Big Data is illustrated as follows.

  • Business Case: Define which kind of data should be collected in use for business operational purposes.
  • Data Collection: Data is then collected and stored in Hadoop primary data storage system (HDFS).
  • Data Modelling: To male sure that data is stored wholy, we need to create data models to store and define all relationship (Map Reduce và YARN).
  • Data Processing: After data is stored we need extract data for different analysis purposes.
  • Data Visualization: Data is then visualized for presentation and is of importance to decision-making process

Characteristics of big data

  • Volume : Data gets bigger all the time.
  • Velocity: Tools used to process, store and…

--

--

Hang Nguyen
Hang Nguyen

Written by Hang Nguyen

A Data Engineer with a passion for technology, literature, and philosophy.