BIG DATA

In today’s world, data is invaluable! It can be used by businesses to improve their products and services which then results in a better customer experience and sales or it can be used to make State-of-the-art AI models.

credits: tdwi.org
But for this, we need a lot of it. Having plenty of data means “we can do a thorough analysis of something using the past DATA associated with it and figure out the common pattern or trend it follows to make future predictions on it”. This is a very common use of data for decision-making used by both businesses and machine learning models. But of course, there are many challenges associated with it too, like where and how we are supposed to store such a big amount of data for processing. But is the volume the only factor that makes Big Data so difficult to handle? The answer is no.

THE THREE V’s

A huge amount of data keeps coming in exponentially from a variety of sources

The above statement pretty much sums up the definition of Big Data very concisely.

Let’s find the THREE Vs in that statement:

“huge amount of data” = Volume

“coming exponentially” = Velocity

“variety of sources” = Variety

Volume, Velocity, and Variety are the three factors that make Big Data so complex to store and process.

So, coming to our last question where and how we are supposed to store such a big amount of data for processing?

How to store Big Data

Data Compression and encoding

“Reduce the size of data without changing its fundamental properties”

Encode the data using fewer bits from the original data size, which can then be decoded and retrieved during the time of access.

 


The above image is an example of Run Length encoding that replaces consecutive occurrences of the same data value with a single value. It is one of the many encoding techniques that is used to reduce storage space and improve data transfer efficiency.

Where to store Big Data?


Big data is stored in specialized data storage systems and platforms designed to handle the volume, variety, and velocity of large datasets. Selecting a storage system depends on the type of data you’re dealing with. Examples are:

Distributes File Systems like HDFS which stores and manages data across multiple machines in a distributed environment.

NoSQL Databases which are highly scalable and can handle diverse data (text, video, audio, images),

Data warehouses which primarily store structured data in centralized repositories

Object Storage primarily stores unstructured data such as images, videos, documents, and log files in a highly scalable and durable environment.

Data Lake stores raw and unprocessed data in centralized repositories. It contains structured, semi-structured, and unstructured data.

All these storage systems above can be used for different purposes like Batch Processing and Analytics, Storing and retrieving user-generated content, Business Intelligence and Analytics, Data Archiving and Backup for long-term storage and disaster recovery, Data Exploration and Advanced Analytics for exploratory analysis, machine learning, and predictive modeling respectively.

Conclusion
In the above article, we learned about the importance of big data, what it exactly is, the factors that make Big data so complex, and how and where it is stored and used.