How MNC manage humongous data with high speed and efficiency?

We all upload and see massive amount of data on social networking websites like Instagram ,Facebook,Twitter..etc.which is quite big in the form of stories,images,tweets and many more but how can they make it available for us in a jiffy ?

The data ranges in Petabytes and Yottabytes which is beyond the capacity of a hardware used in our Desktop and Laptops.

So let’s checkout a bit larger storage devices available,

Buying these expensive storage wont be a big deal for MNC but it wont be able to suffice for the amount of data stored and processed even for a day.

They need more storage, and for that the cost rises exponentially, with increasing infrastructure.

This is just a glimpse of data centre located in India

Although the problem of storing data is solved by installing such enormous data centres at various locations now coming on to the processing the data .

But,

just connecting the wires amongst different hardware in data centre won’t store and process the data automatically

We need some algorithm/methodology along with a software to implement it otherwise data centres would be like a train parked in the yard just occupying the space 😅

So the methodology which is used to make the process expeditious is

Distributed storage system

The open source software used to implement distributed storage methodology is HADOOP which is a collection of open source utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data .

It follows a topology where in the resources of many systems together in form of data nodes are virtually connected to a master system/name node which is also known as master slave cluster .Wherein the data is striped and distributed parallelly with the data nodes or the slaves in the cluster. This makes the task efficient and thereby solves the problem of processing of big data.

In a nutshell

Distributed storage system works in the same manner as that in a shopping mall (name node) which has many department like grocery,food clothing…etc resembling data nodes and the customer visiting the mall in the form of big data.Rather than storing and processing all the data i.e the visitors in a small general store(single commodity hardware) a shopping mall brings the efficiency by distributing the the people (data) into various departments(data node) and thus avoid data being in haphazard way.

Well it is just the Day 2 of ARTH Journey

--

--

--

Technology Enthusiast | Cloud & DevOps Engineer | Cyber Security Researcher

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

An Extensive Guide to Python Lists

300 Solved Project Euler Problems

IOTA Research Status Update - April 2020

Butler SOS: real-time server stats for Qlik Sense

CI/CD Pipeline with Jenkins for WebServer Development (using Kubernetes) with Groovy Language

CAT CTF.ae-Write-up

MAY CHALLENGE~LEETCODE/DAY 16/

Python or R for data science?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
MUBIN GIRACH

MUBIN GIRACH

Technology Enthusiast | Cloud & DevOps Engineer | Cyber Security Researcher

More from Medium

Log PrestoSQL/Trino Queries

Athena Partition Projection sample queries ddl’s

Data Engineering Using Apache NiFi On Windows (10, of course)

Understanding MapReduce in a simple way