Many users from the developer community as well as other proponents of Big Data are of the view that Big Data technology stack is congruent to the Hadoop technology stack (as Hadoop as per many is congruous to Big Data). This may not be the case specifically for top companies as the Big Data technology encompasses a rich context of multiple layers. Before coming to the technology stack & the series of tools & technologies employed for project executions; it is important to understand the different layers of Big Data Technology
Different layers in Big Data Technology
Big Data in its true essence is not limited to a particular technology; rather the end to end architecture encompasses a series of four different layers – mentioned below for reference.
The data layer is the backend of the entire system wherein this layer stores all the raw data which comes in from different sources including transactional systems, sensors, archives, analytics data; and so on.
The importance of the ingestion or integration layer comes into being as the raw data stored in the data layer may not be directly consumed in the processing layer. Hence the ingestion massages the data in a way that it can be processed using specific tools & technologies used in the processing layer.
The processing layer is the arguably the most important layer in the end to end Big Data technology stack as the actual number crunching happens in this layer. In this layer, analysts process large volume of data into relevant data marts which finally goes to the presentation layer (also known as the business intelligence layer).
The ‘BI-layer’ is the topmost layer in the technology stack which is where the actual analysis & insight generation happens.
Technology Stack for each of these Big Data layers
The technology stack in the four layers as mentioned above are described below –
1) Data layer – The technologies majorly used in this layer are Amazon S3, Hadoop HDFS, MongoDB etc. (specifically database technologies)
2) Ingestion layer – The technologies used in the integration or ingestion layer include Blendo, Stitch, Kafka launched by Apache and so on.
3) Processing layer – Common tools and technologies used in the processing layer includes PostgreSQL, Apache Spark, Redshift by Amazon etc.
4) Analysis layer – This layer is primarily into visualization & presentation; and the tools used in this layer includes PowerBI, QlikView, Tableau etc.