Many users from the developer community as well as other proponents of Big Data are of the view that the Big Data technology stack is congruent to the Hadoop technology stack (as Hadoop as per many is congruous to Big Data).

This may not be the case specifically for top companies as the Big Data technology stack encompasses a rich context of multiple layers. Before coming to the technology stack and the series of tools & technologies employed for project executions; it is important to understand the different layers of the Big Data Technology Stack

Different layers in Big Data Technology

Big Data in its true essence is not limited to a particular technology; rather the end-to-end big data architecture layers encompass a series of four – mentioned below for reference.

The data layer is the backend of the entire system wherein this layer stores all the raw data which comes in from different sources including transactional systems, sensors, archives, analytics data; and so on.

The importance of the ingestion or integration layer comes into being as the raw data stored in the data layer may not be directly consumed in the processing layer. Hence the ingestion massages the data in a way that it can be processed using specific tools & technologies used in the processing layer.

The processing layer is arguably the most important layer in the end-to-end Big Data technology stack as the actual number crunching happens in this layer. In this layer, analysts process a large volume of data into relevant data marts, which finally go to the presentation layer (also known as the business intelligence layer).

The ‘BI-layer’ is the topmost layer in the technology stack which is where the actual analysis & insight generation happens.

Technology Stack for each of these Big Data layers

The technology stack in the four layers as mentioned above are described below –

1) Data layer

The technologies majorly used in this layer are Amazon S3, Hadoop HDFS, MongoDB, etc. (specifically database technologies)

2) Ingestion layer

The technologies used in the integration or ingestion layer include Blendo, Stitch, Kafka launched by Apache, and so on.

3) Processing layer

Common tools and technologies used in the processing layer include PostgreSQL, Apache Spark, Redshift by Amazon, etc.

4) Analysis layer

This layer is primarily for visualization & presentation; and the tools used in this layer include PowerBI, QlikView, Tableau, etc.