Hadoop Characteristics

Summary -

In this topic, we described about the Hadoop Characteristics in detail.

Hadoop can handle large volumes of structured and unstructured data more efficiently than the traditional enterprise data warehouse. Hadoop provides most reliable storage layer – HDFS, a batch Processing engine – MapReduce and a Resource Management Layer – YARN.

Below are the list of Hadoop features -

Open Source

Hadoop is an open source project and its code can be modified according to business requirements.

Distributed Processing

As data is stored in a distributed manner in HDFS across the cluster and data is processed in parallel on a cluster of nodes.

Faster

Hadoop is extremely good at high-volume batch processing because of its ability to do parallel processing. Hadoop can perform batch processes multiple times faster than on single thread server or on the mainframe.

Fault Tolerance

The data sent to one individual node and the same data also replicates on other nodes in the same cluster. If the individual node failed to process the data, the other nodes in the same cluster available to process the data.

Reliability

Due to data replication in the cluster, data is reliably stored on the cluster of machine despite machine failures. If the node failed to process the data, the data will be stored reliably due to this characteristic of Hadoop.

High Availability

Data is highly available and accessible despite hardware failure due to multiple copies of data. If the machine or hardware crashes, then data will be accessed from another path.

Scalability

Hadoop is a highly scalable storage platform as it can store and distribute very large data sets across hundreds of systems/servers that operate in parallel. Hadoop enables businesses to run applications on thousands of nodes involving thousands of terabytes of data processing.

And also supports hardware horizontal scalability which can add the nodes during the processing without system downtime.

Flexibility

Hadoop manages data whether structured or unstructured, encoded or formatted, or any other type of data. Businesses can use Hadoop to derive valuable business insights from data sources such as social media, email conversations. Hadoop brings the value to the table where unstructured data can be useful in decision making process.

Economic/Cost effective

Hadoop offers a cost-effective storage solution for businesses exploding data sets. Hadoop is not very expensive as it runs on a cluster of commodity hardware.

Easy to use

No need of client to deal with distributed computing, the framework takes care of all the things. So, Hadoop is easy to use.

Data Locality

This one is a unique feature of Hadoop that made it easily handle the Big Data. When a client submits the MapReduce algorithm, this algorithm is moved to data in the cluster rather than bringing data to the location where the algorithm is submitted and then processing it.