How Data Compression works

We generally use .rar or .zip files to reduce the size of files and folder with the purpose of data transfer over different networks. These files are compressed files. Data compression is the process of encoding files and data like text, audio, graphics, images, etc. in order to reduce the number of bits to represent that file or data. In data compression, the actual information does not change but the internal representation of the information changes.

The concept of data compression is based on the fact that most types of files actually have redundant data and in order to compress a file, the data bits of the file are rearranged to make it smaller and more compact. There are many different algorithms and procedures are used to rearrange the data bits that are known as compression algorithms.

There are two processes – data compression and data retrieval. In order to retrieve the actual information from the compressed file the algorithm for the both compression and uncompression must be same. Let’s take an example of popular compression software WinZip. While we choose a file or folder and select it to compress it either in .rar or .zip file, in a few moments we have a compressed file. But behind the scenes, a particular algorithm runs that reads the bit representation of the information (in file or folder) and finds the redundancy depending on the algorithm that is used. It rearrange the bit representation of the file by removing the redundancies i.e. instead of listing the same type of information again and again, it lists an information once in the file and if it occurs again in the file then that information is removed from there and a reference of its original location is placed there. Hence the number of bits to represent the information is reduced in the compressed file.

To retrieve the information back, i.e. to uncompress the file into the original form, the same algorithm is used which finds out all the references made in the encoded file and replace with their corresponding original information.