Essence of Digital Images: Building blocks

In the era of digital transformation, images emerge as more than mere visual artifacts from the graphics/visuals perspective. Images/Visuals are not just pictures, but they are technological wonders which is a blend of data structures, mathematics, and algorithms. This blend gave rise to these visual experiences that we encounter daily in our lives in the form of digital images.

Pixels - The smallest building block

At the core of every digital image lies the pixel. An image is made of tiny pixels sitting together in the form of the grid which creates the matrix representing the image dimensions and resolution. Consider a 3x3 8-bit pixel image in grayscale color space (Grayscale color space is considered for simplicity). Each pixel carries a numerical value representing color intensity from 0 (black) to 255 (white) corresponding to 8 bits of data.

digital images — (Left) 3x3 8-bit grayscale image | (Right) Color intensity representation

Color intensity numbers in red are just for representation purposes.

In RGB color space, the pixel intensity is represented as (255, 0, 0) (255 - represents Red channel color intensity, 0 - represents Green channel color intensity, 0 — represents Blue channel color intensity), indicating the pixel color is Red and no green or blue. Each channel color intensity is represented on a scale from 0 to 255 with 256 possible values for each color channel. Consider the 3x3 8-bit scale pixel image in RGB color space as shown below.

Note: Each channel of RGB is 8-bit, which can be called 24-bit (8R x 8G x 8B = 24 bit per pixel) color depth data (True color). The 8-bit scale provides 2⁸ = 256 possible values ranging from 0 to 255 for each channel per pixel. Hence 256 x 256 x 256 = over 16 million possible color combinations.

The alpha channel is not included for simplicity.

Color encoding & representation

The previous section touched upon how single & multi-channel (RGB) colors can be represented for a given pixel. This section will cover the two different approaches to representing and encoding colors in digital images.

Indexed color — Indexed color also known as palette-based color, uses a lookup table with a limited amount of colors. Each pixel in the image is associated with a predefined index value in the color lookup table. It is memory, storage and transmission efficient.

Direct color — Direct color is known as true color or 24-bit color which is one of the methods of representing colors in digital images. Unlike indexed color, direct color does not have a color lookup table instead each pixel has its own color value based on a combination of primary colors (RGB). Each channel in RGB is represented using 8 bits each, resulting in 24 bits per pixel.

File formats like JPEG, PNG, TIFF support direct color images whereas GIF, PNG-8 file format supports indexed color.

The color palette (above image — Right) used in the indexed color example is the same as the lookup table mentioned above as an example.

Note: Bit depth (8, 16, 32) quantifies the number of distinct numerical color values each pixel can have. The higher the bit depth, the finer the color gradations (quality). This collectively influences the image richness and accuracy.

Color spaces

Imagine you want to tell someone about the colors of the breathtaking sunrise image below. While you can use words like red, blue, orange, grey but these words cannot capture all the details. Hence, there is a need for different language which can capture the details of colors; i.e. Color spaces.

Color spaces is a system/mathematical model that defines how visual colors can be represented as numerical values. It is the standardized and structured way to describe the colors across various applications. There are multiple color spaces each with its own properties. However, the most common color spaces includes but are not limited to—

RGB (Red, Green, Blue) — RGB represents colors as a combination of red, green, and blue as primary colors, each color channel intensity is represented from 0 to 255 in 8-bit systems. In 16-bit systems (16-bit image)(2¹⁶), each channel intensity is represented by the numerical value ranging from 0 to 65,535. RGB is an additive color model, where primary colors are mixed at different intensities to create a wide range of colors.

HSV (Hue, Saturation, Value) — HSV represents colors in Hue (which color), Saturation (range of grey or intensity of the color), Value (brightness of the color). Hue is represented in degree from 0⁰ to 360⁰ whereas Saturation & Value is represented in percentage. HSV color space is similar to how humans perceive color.

Lab — Lab color space is used where precision of color is important. L channel represents brightness/lightness, a channel represents the color position on green to red axis (horizontal), b channel represents the color position on blue to yellow axis (vertical).

CMY/CYMK (Cyan, Magenta, Yellow, Key/Black)- CMY is the combination of cyan, magenta, and yellow as primary colors. CYMK is an extension of CMY which includes the fourth channel as black. This color space is commonly used in printing. It is a subtractive color model where colors are layered to create a wide spectrum of colors.

There are other color spaces like YUV/YCbCr, YIQ, YDbDr, sRGB, and more.

Image formats & compression

Image formats; which are likely to be identified by the extension of the file like jpg/jpeg, png, tiff, bmp, raw, and more carry a varying representation of the same visuals (for the given image). Different file formats use compression algorithms that fall under lossless and lossy compressions. Compression algorithms help to optimize storage and transmission by reducing the file size while the intricate details can be retained.

JPEG is the lossy compression that selectively discards some data to reduce size while PNG format uses a lossless data compression algorithm. It’s a trade-off between compression ratios and image quality. The balancing ratio between size and quality varies across different applications.

This section mainly touched upon raster formats like JPEG, PNG which deal with pixel data, however, there are vector formats as well which include SVG, EPS, AI which are defined by points, curves, and lines rather than pixels and they are also resolution independent.

Digital image manipulation

Consider the photo that you want to crop, rotate, apply a filter, or apply any sort of transformation and this is where digital image processing techniques like filtering, image enhancement, noise reduction, and blurring come in handy. For example, if you want to smoothen the noise from the photograph — blurring techniques like Gaussian blur, median blur can be applied which will in general smoothen the image by averaging the pixel values based on the kernel. This technical computational process fine-tunes images for aesthetic, clarity, and analysis.

Images are something that we interact with daily for a variety of purposes and the visual data is growing at an exponential rate. Hence, it becomes crucial that a new dimension of visual analysis evolve, interpret and understand images. With the advancement in AI/ML, it has already revolutionized the computer vision area which has enabled machines to understand images.

Conclusion

Digital images are a combination of mathematics, algorithms, and engineering starting from pixels coming together to form grids, color spaces lightning up pixels with colors, striking a balance between size and quality for better storage & transmission, extending further to filters/manipulation which gives a different perspective and AI/ML opening up a whole new space of visual analysis.

In this article, I have barely scratched the surface in terms of images but I hope it gives you a holistic view of images at a high level.

Thank you for reading!