Huffman coding in JPEG

JPEG (Joint Photographic Experts Group) compression is a widely used method for compressing digital images. It utilizes a lossy compression technique to reduce the file size of images while attempting to preserve visual quality. Here are the general steps involved in JPEG compression:

Color Space Conversion:
- The first step is often to convert the image from its original color space (usually RGB for most images) to the YCbCr color space. This separates the image into luminance (Y) and chrominance (Cb and Cr) components.
Subsampling:
- In the YCbCr color space, chrominance information (Cb and Cr) can be subsampled. This means that some color information is reduced, which can result in significant compression without much visible loss in image quality. Common subsampling ratios are 4:4:4 (no subsampling), 4:2:2, and 4:2:0.
Block Splitting:
- The image is divided into small blocks (typically 8x8 pixels). These blocks are then processed independently.
Discrete Cosine Transform (DCT):
- Each 8x8 block is transformed from the spatial domain to the frequency domain using DCT. This process converts the image data from its original pixel values to a set of coefficients that represent the different frequencies present in the block.
Quantization:
- After the DCT, quantization is applied to the DCT coefficients. Quantization involves dividing each coefficient by a quantization matrix. This step is where significant loss of data and quality occurs. Higher quantization values result in more compression but also more loss of quality.
Entropy Encoding:
- The quantized DCT coefficients are then entropy-encoded. Huffman coding is often used to represent the coefficients more efficiently. Run-length encoding can also be applied to exploit the run of zeros in the quantized coefficients.
Header Information:
- Various header information, such as the quantization tables, subsampling information, and image dimensions, is stored along with the compressed data to aid in decoding.
Write to File:
- The compressed data, along with the header information, is written to a file. The file is typically given a “.jpg” extension.
Optional Post-processing:
- Some post-processing may be applied to the compressed image, such as smoothing or color correction, to improve the visual quality to some extent.
Decompression (Reconstruction):
- To view the compressed image, it must be decompressed. The decompression process reverses the steps outlined above, starting with entropy decoding, dequantization, inverse DCT, and color space conversion.

Huffman coding post DCT

Huffman coding comes into play once we have the matrix of DCT coefficients.

In this blog, we will use the huffman library. (Refer to my blog on Huffman coding for generating huffman codes from scratch.)

import huffman
import numpy as np

# Create a 2D array (8x8) with DCT coefficients (replace with your actual coefficients)
dct_coefficients = np.array([
    [20, 30, 10, 25, 30, 0, 0, 0],
    [0, 5, 0, 0, 0, 0, 5, 0],
    [0, 0, 15, 0, 0, 0, 20, 0],
    [0, 0, 0, 0, 1, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0]
], dtype=np.int32)
print("Initial DCT matrix")
print(dct_coefficients)

Initial DCT matrix
[[20 30 10 25 30  0  0  0]
 [ 0  5  0  0  0  0  5  0]
 [ 0  0 15  0  0  0 20  0]
 [ 0  0  0  0  1  0  0  0]
 [ 0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0]]

Encoding in the form: ((r,s), c)

Iterate through the matrix row by row.

c is a non-zero element in the dct_coefficients matrix.

Re calculate the number of zero elements before every non zero element (r) and the number of bits needed to represent the non-zero element (s).

Thus,

c = Non-zero element.

r = Number of zeros before the non-zero element.

s = Number of bits used to represent the non-zero element.

In practice, we don’t iterate row by row but along diagonals in a zig-zag pattern, but the logic of generating ((r,s),c) remains the same.

def calculate_rs_pairs(dct_coefficients):
    rs_pairs = []
    run_length = 0

    for row in dct_coefficients:
        for element in row:
            if element == 0:
                run_length += 1
            else:
                bits_needed = len(bin(abs(element))) - 2  # Calculate the number of bits needed to represent c
                rs_pairs.append(((run_length, bits_needed), element))
                run_length = 0

    return rs_pairs

# Calculate (r, s) pairs
rs_pairs = calculate_rs_pairs(dct_coefficients)

print("(r,s) pairs and c values:")
for i in range(len(rs_pairs)):
  print(rs_pairs[i])
print()

(r,s) pairs and c values:
((0, 5), 20)
((0, 5), 30)
((0, 4), 10)
((0, 5), 25)
((0, 5), 30)
((4, 3), 5)
((4, 3), 5)
((3, 4), 15)
((3, 5), 20)
((5, 1), 1)

Huffman coding of (r,s) pairs

These (r,s) pairs repeat several times, thus, we use Huffman coding to represent them optimally.

Assigning a unique symbol to all distinct (r,s) pairs

This is needed to pass the pair (in the form of the symbol) to the Huffman library for Huffman coding.

d = {} # maps (r,s) pair to a unique symbol (for passing in huffman library)
d2 = {} # Inverse of d
alph = "abcdefghijklmnopqrstuvwxyz"
pos=0

for (r, s), _ in rs_pairs:
  if (r,s) in list(d.keys()):
    continue
  else:
    d[(r,s)] = alph[pos]
    d2[alph[pos]] = (r,s)
    pos+=1

Calculating the frequency of these (r,s) pairs

frequency_rs = {}
for (r,s),_ in rs_pairs:
    if d[(r, s)] in frequency_rs:
        frequency_rs[d[(r, s)]] += 1
    else:
        frequency_rs[d[(r, s)]] = 1

Performing Huffman coding

The code of a DCT matrix consists of a string of codes of every non-zero element (c), which has 2 parts:

Huffman code of the (r,s) pair associated with c and
s bits (as s is the number of bits used to represent c in binary form)

So, we know the non-zero digit c and the number of zeros before it in the dct_coefficients matrix.

for (r,s),_ in rs_pairs:
    if d[(r, s)] in frequency_rs:
        frequency_rs[d[(r, s)]] += 1
    else:
        frequency_rs[d[(r, s)]] = 1

# Build Huffman codes for (r, s) pairs
huffman_rs = huffman.codebook([(x,frequency_rs[x]) for x in list(frequency_rs.keys())])

# Create a dictionary to map (r, s) pairs to their Huffman codes
rs_to_huffman = {pair: huffman_rs[pair] for pair in frequency_rs.keys()}


encoded_data = ""
for (r, s), c in rs_pairs:
    encoded_data += rs_to_huffman[d[(r, s)]] + format(c, f'0{s}b')  # Encode (r, s) and c
    print("((",r,",",s,")",c,")   Huffman code for (r,s) pair: ",rs_to_huffman[d[(r, s)]]," Binary representation of c: ", format(c, f'0{s}b'))

print()
print("Encoded Data:")
print(encoded_data)

(( 0 , 5 ) 20 )   Huffman code for (r,s) pair:  11  Binary representation of c:  10100
(( 0 , 5 ) 30 )   Huffman code for (r,s) pair:  11  Binary representation of c:  11110
(( 0 , 4 ) 10 )   Huffman code for (r,s) pair:  100  Binary representation of c:  1010
(( 0 , 5 ) 25 )   Huffman code for (r,s) pair:  11  Binary representation of c:  11001
(( 0 , 5 ) 30 )   Huffman code for (r,s) pair:  11  Binary representation of c:  11110
(( 4 , 3 ) 5 )   Huffman code for (r,s) pair:  00  Binary representation of c:  101
(( 4 , 3 ) 5 )   Huffman code for (r,s) pair:  00  Binary representation of c:  101
(( 3 , 4 ) 15 )   Huffman code for (r,s) pair:  101  Binary representation of c:  1111
(( 3 , 5 ) 20 )   Huffman code for (r,s) pair:  011  Binary representation of c:  10100
(( 5 , 1 ) 1 )   Huffman code for (r,s) pair:  010  Binary representation of c:  1

Encoded Data:
1110100111111010010101111001111111000101001011011111011101000101

Isn’t that amazing! Using such a small number of bits, we are able to transmit the entire DCT matrix.

Decoding - Finding the DCT Matrix

decoded_data = []
current_bits = ""
i=0
while i<len(encoded_data):
    # print(i)
    current_bits += encoded_data[i]
    i+=1
    for ch, huffman_code in rs_to_huffman.items():
        if current_bits == huffman_code:
            current_bits = ""
            c_bits = ""
            r,s = d2[ch]
            # print(s)
            for j in range(s):
                c_bits += encoded_data[i+j]
            decoded_data.append((ch,int(c_bits, 2)))
            # print(ch,int(c_bits,2))
            i+=s
            # print(i)
            c_bits = ""

print("\nDecoded Data:")
dl = []
for ch,c in decoded_data:
  print(d2[ch],c)
  dl.append((d2[ch],c))

newl = []
for ele in dl:
  r,s = ele[0]
  c = ele[1]
  for i in range(r):
    newl.append(0)
  newl.append(c)

ll = len(newl)
for i in range(ll+1,64+1):
  newl.append(0)

import numpy as np
newl = np.array(newl)
newl = np.reshape(newl,(8,8))
print()
print("The decoded matrix is:")
print(newl)


Decoded Data:
(0, 5) 20
(0, 5) 30
(0, 4) 10
(0, 5) 25
(0, 5) 30
(4, 3) 5
(4, 3) 5
(3, 4) 15
(3, 5) 20
(5, 1) 1

The decoded matrix is:
[[20 30 10 25 30  0  0  0]
 [ 0  5  0  0  0  0  5  0]
 [ 0  0 15  0  0  0 20  0]
 [ 0  0  0  0  1  0  0  0]
 [ 0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0]]

Summary

print("Initial matrix")
print(dct_coefficients)
print()

print("(r,s) pairs and c values:")
for i in range(len(rs_pairs)):
  print(rs_pairs[i])
print()

print("Encoded Data:")
print(encoded_data)

print("\nDecoded Data:")
print()
print("The decoded matrix is:")
print(newl)

Initial matrix
[[20 30 10 25 30  0  0  0]
 [ 0  5  0  0  0  0  5  0]
 [ 0  0 15  0  0  0 20  0]
 [ 0  0  0  0  1  0  0  0]
 [ 0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0]]

(r,s) pairs and c values:
((0, 5), 20)
((0, 5), 30)
((0, 4), 10)
((0, 5), 25)
((0, 5), 30)
((4, 3), 5)
((4, 3), 5)
((3, 4), 15)
((3, 5), 20)
((5, 1), 1)

Encoded Data:
1110100111111010010101111001111111000101001011011111011101000101

Decoded Data:

The decoded matrix is:
[[20 30 10 25 30  0  0  0]
 [ 0  5  0  0  0  0  5  0]
 [ 0  0 15  0  0  0 20  0]
 [ 0  0  0  0  1  0  0  0]
 [ 0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0]]