MP3 Format: Understanding the basics of digital music

In today’s world, the word MP3 has become synonymous with music. Almost everyone has experienced MP3 in some way – be it through listening to your favourite songs on your music player or phone, the internet, a podcast, or something similar. MP3 has revolutionized the digital music world on its own and even though it’s been around for quite some time, the MP3 still remains the most popular form of music used across the globe.

Now, even though it’s something that we use on a daily basis, have you ever wondered how it works? Well, this article aims to inform you about that as well as the basic principles involved in the process.

But first, what exactly is MP3? The MP3 format is basically an audio-specific format which uses a compression system to reduce the size of music files. MP3 stands for MPEG Phase 1 Layer 3, where MPEG refers to Motion Picture Experts Group which is a family of standards for displaying video and audio using lossy compression. A ‘lossy’ compression implies that during the compression process, some of the audio data was lost which leads to the creation of a file not identical to its original. A simple schematic of the lossy compression algorithm is shown below:

Simple schematic of the lossy compression algorithm

Fig. 1: Simple Schematic of the Lossy Compression Algorithm

Layer 3 is one of three coding schemes for the compression of audio data. It uses perceptual audio coding and psychoacoustic compression to remove all unnecessary information in the signals. It also adds a MDCT (Modified Discrete Cosine Transform) that implements a filter bank, increasing the frequency resolution 18 times higher than that of layer 2. This result in a file reduced in size with minimal audio degradation. MP3 now uses the ID3 tagging system of an audio file with details associated with its ownership, production and contents – a system which can be used to catalogue and manage collections of MP3 files.

Now, let’s go back – who created the MP3 and what was the need for it? MP3 technology was developed between 1987 and 1991 by engineers at the German company Fraunhofer Gesellschaft as an attempt to reduce digital audio file size with the minimum degradation of perceived audio quality The inventors for the MP3 patent are Bernhard Grill, Karl-Heinz Brandenburg, Thomas Sporer, Bernd Kurten, and Ernst Eberlein.

Uncompressed audio files are rather large, as sound is very complex and the translation of it into a digital format that a computer can understand requires a lot of data. MP3 works to make file sizes smaller by using what is called psychoacoustic models. In this model, the audio signals that most people would not hear because it is too low or too high are eliminated. By doing this, file sizes can be greatly reduced. A 128 Kbit/s MP3 file is about 1/11^th the size of the corresponding file on an uncompressed CD. This smaller size enables faster delivery via the internet, and easier sharing and portability, as well as its reduced mass storage requirements.

Principle – compression algorithm and psychoacoustics

Two kinds of compression methods are used for reduction of music files in MP3. First, it filters out what is inaudible to the human ear (if the signal frequencies are too high or too low) and next it works on encoding the remaining data via more traditional means (like the ’zip’ compression method) to further compress the files. This compression technique results in loss of audio signal data, hence it is termed as lossy compression.

Consider these two scenarios: 1. You hear two similar notes one after the other, very close together in time; the result – your brain may perceive only one of them. 2. You hear two different sounds but one is much louder than the other; the result – your brain may never perceive the quieter signal. The study of these auditory phenomena is called psychoacoustics. MP3 coding takes advantage of this psychoacoustic phenomenon to make changes to the signals, and thus reduces the amount of information needed to express it in digital form, decreasing its file size considerably.

MP3 format is often termed as a perceptual codec as it mathematically describes the limitations of auditory perception. The basic principle of any perceptual codec is that there’s little point in storing information that can’t be perceived by humans. MP3 encoding tools analyze incoming source signal, break it down into mathematical patterns, and compare these patterns to psychoacoustic models stored in the encoder itself. The encoder can then throw away most of the data that doesn’t match the stored models, while retaining that which matches.

Process Description

The key to audio compression in MP3 lies in the bit rate – the number of bits per second encoded in the audio file. If the bit rate is low, the encoder will discard more data and vice versa. The basic working follows that an MP3 encoder splits the signal into 22 frequency bands and then process each band separately for storage. These signals are then decoded and recombined for playback.

Image Showing Mp3 Building Process

Fig. 2: Image Showing Mp3 Building Process

As shown above, if the bitrate is high, the signal is effectively conveyed with better resolution but higher file size. In case of smaller bitrate, the size is reduced but the audio resolution is changed accordingly.

Let’s break down the MP3 building process:

· The first step is to divide the source audio into components called ‘frames’, which individually contains about a fraction of a second’s audio data. This happens every 26 ms or .026 seconds, i.e. creating approximately 38 frames per second.

· The signal is analyzed to determine the distribution of bits for the best possible account of the audio on the entire spectrum. This involves splitting the signal into different bands based on frequency.

· The audio in these frames is then compressed to a target number of bits using psychoacoustic modelling. The bitrate is used to calculate the number of bits that can be allocated to each frame and hence the amount of audio data to be stored is decided. The band frequencies of the signal are compared to the reference models in the encoder itself, and the ones that do not match are discarded.

· The remaining data is compressed to shrink the space for redundancies via traditional means and Huffman coding.

The collection of frames is assembled into a serial bit-stream, with header information preceding each data frame. The headers contain instructional “meta-data” specific to that frame. Each frame header contains 32 bits, comprised of a synchronisation reference number and various other identifiers of the frame’s contents (bitrate, sample rate, etc.). The header is then followed by the frame’s audio data. This series of frames constitutes the standard MP3 file.
MP3 Header, Decoding

The MP3 frame header is as depicted below:

Fig. 3: Diagram Showing Structure of MP3 Frame Header

Frame sync reference – 11 bits

MPEG audio version – 2 bits. This specifies whether it has been coded in MPEG-1 or MPEG-2.
MPEG layer – 2 bits. This specifies the particular layer of the frame.
Protection on/off – 1 bit. If it is on, the checksum follows the header.
Bitrate – 4 bits. This contains the bitrate of current frame, obtained from a lookup table.
Sampling Rate – 2 bits. This contains the audio frequency (e.g. 44.1 kHz), obtained from a lookup table.
Padding bit – 1 bit. This is to compensate for unfilled frames.
Application-specific reserved (private) bit – 1 bit. This allows for application-specific triggers.
Channel mode – 2 bits. This specifies the channel which could be mono, dual, split stereo or joint stereo.
Mode extension (for joint stereo mode) – 2 bits. Used to conjoin channel data.
Copyright (on/off) – 1 bit. This is set to prevent illegal copying/piracy of the file.
Original (on/off) – 1 bit
Emphasis – 2 bits. This is used as a flag, depicts emphasis bit if set in the original recording.
Audio data – the decoder moves on through the checksum (if it exists) and on to the actual audio data frame.

The ID3 tags appear at the beginning of the bit-stream, rather than at the end. This is to be able to display all data throughout the length of the track and not just the end, when an MP3 file is being broadcast or streamed rather than simply downloaded.

The following schematic shows the working model in brief:

Fig. 4: Block Diagram Showing Working of Mp3 Header

Decoding

Most MP3 encoder software allows you to start with any type of audio file (including another MP3), specify encoding or import options, and then play the compressed MP3 file. Unlike encoding, MP3 decoding (playback) is a standardised process, and part of MP3’s official definition, so different players should not give significantly different results. They obviously decode the MP3 data, but also convert the digital bit-stream into analogue sound and output to headphones or an external amplifier, and as such will have an effect on playback quality.

CD vs. MP3

Since the compression schemes are lossy as audio is being compressed, the general view is that MP3 is inferior to CD-quality. The CD-audio format can only run 16 bit audio at 44.1 kHz sample rate, while one can create MP3 out of 128 kHz sample rate audio-files with 24 bit resolution. Layer 3 shrinks the original sound data from a CD with a bit rate of 1411.2 kilobits per one second of music down to 112-128kbps. Thus by using MPEG coding, you may shrink down the original sound data from a CD by a factor of 12, without sacrificing the audio quality.

Down the line, MP3 has evolved impressively along with the drastic growth of the internet and digital audio. In that sense, it has had a huge influence on the commercial, educational and creative audio industries, and continues to dominate the market for delivering and sharing digital audio.