How MNC manage humongous data with high speed and efficiency?
We all upload and see massive amount of data on social networking websites like Instagram ,Facebook,Twitter..etc.which is quite big in the form of stories,images,tweets and many more but how can they make it available for us in a jiffy ?
The data ranges in Petabytes and Yottabytes which is beyond the capacity of a hardware used in our Desktop and Laptops.
So let’s checkout a bit larger storage devices available,
Buying these expensive storage wont be a big deal for MNC but it wont be able to suffice for the amount of data stored and processed even for a day.
They need more storage, and for that the cost rises exponentially, with increasing infrastructure.
Although the problem of storing data is solved by installing such enormous data centres at various locations now coming on to the processing the data .
just connecting the wires amongst different hardware in data centre won’t store and process the data automatically
We need some algorithm/methodology along with a software to implement it otherwise data centres would be like a train parked in the yard just occupying the space 😅
So the methodology which is used to make the process expeditious is
Distributed storage system
The open source software used to implement distributed storage methodology is HADOOP which is a collection of open source utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data .
It follows a topology where in the resources of many systems together in form of data nodes are virtually connected to a master system/name node which is also known as master slave cluster .Wherein the data is striped and distributed parallelly with the data nodes or the slaves in the cluster. This makes the task efficient and thereby solves the problem of processing of big data.
In a nutshell
Distributed storage system works in the same manner as that in a shopping mall (name node) which has many department like grocery,food clothing…etc resembling data nodes and the customer visiting the mall in the form of big data.Rather than storing and processing all the data i.e the visitors in a small general store(single commodity hardware) a shopping mall brings the efficiency by distributing the the people (data) into various departments(data node) and thus avoid data being in haphazard way.
Well it is just the Day 2 of ARTH Journey