Summary
I needed to understand how to put together and take apart a JPEG file. Strangely enough, detailed complete information on the popular JPEG file format is hard to find. Perhaps because of how common libjpeg happens to be, people don't generally roll their own JPEG implementation. This post is to document what I discovered.
I remember playing with TIFF files in the 1980s. JPEG uses similar constructs. There is a set of common tags -- called markers or segments -- followed by a size, and then the data specific to that tag. This way, parsers that don't know how to interpret a certain marker can skip ahead in the file to the next one, completely ignoring structures that it doesn't understand.
All tags in JPEG files start with the value 0xFF. If the value 0xFF is ever needed in a JPEG file, it must be escaped by immediately following it with 0x00. This is called "byte stuffing".
So knowing that markers are 0xFF followed by anything other than 0x00, it becomes easy to start pulling apart JPEG files.
xxd -c16 -g1 -u testimg.jpg | grep --color=always -C999 FF
00000000: FF D8 FF E0 00 10 4A 46 49 46 00 01 01 00 00 01 ......JFIF......
00000010: 00 01 00 00 FF DB 00 43 00 08 06 06 07 06 05 08 .......C........
00000020: 07 07 07 09 09 08 0A 0C 14 0D 0C 0B 0B 0C 19 12 ................
00000030: 13 0F 14 1D 1A 1F 1E 1D 1A 1C 1C 20 24 2E 27 20 ........... $.'
00000040: 22 2C 23 1C 1C 28 37 29 2C 30 31 34 34 34 1F 27 ",#..(7),01444.'
00000050: 39 3D 38 32 3C 2E 33 34 32 FF DB 00 43 01 09 09 9=82<.342...C...
00000060: 09 0C 0B 0C 18 0D 0D 18 32 21 1C 21 32 32 32 32 ........2!.!2222
00000070: 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 2222222222222222
00000080: 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 2222222222222222
00000090: 32 32 32 32 32 32 32 32 32 32 32 32 32 32 FF C0 22222222222222..
000000a0: 00 11 08 00 95 00 E3 03 01 22 00 02 11 01 03 11 ........."......
000000b0: 01 FF C4 00 1F 00 00 01 05 01 01 01 01 01 01 00 ................
000000c0: 00 00 00 00 00 00 00 01 02 03 04 05 06 07 08 09 ................
...
All markers but two are immediately followed by a 2-byte size. The size never includes the 2-byte marker itself, but always includes the 2-byte size. This means the data in a marker is limited to 64KiB - 2 bytes. However, a marker may appear multiple times, and one particular marker -- the one with the actual image data -- works slightly differently to accomodate a payload of any size.
JPEG segments, tags, markers
Some information on JPEG segments/tags/markers:
TLA |
Name |
Hex |
Size |
Required |
Special Notes |
SOI |
start of image |
0xFF 0xD8 |
This tag does not have a size. |
Yes |
This tag must be the first one in the file. |
APP0 |
application data |
0xFF 0xE0 |
0x00 0x10 (16 bytes) for a standard image without a thumbnail. |
Yes |
This tag must come immediately after the SOI. |
DQT |
define quantization table |
0xFF 0xDB |
Variable size. Typically 0x00 0x43 (67 bytes) per table if this tag appears multiple times in the file. 0x00 0x84 (204 bytes) if two tables have been combined into a single tag. More if there are multiple tables, or if the tables are 16-bit instead of 8-bit. |
Yes |
The standard allows for multiple tables to be combined into a single DQT tag. I've seen both in use, JPEG files with multiple DQT segments, and JPEG files where the tables have been combined. |
DHT |
define Huffman table |
0xFF 0xC4 |
Variable, depending on the number and the size of the tables. |
Yes |
The standard allows for multiple tables to be combined into a single DHT tag. I've seen both in use, JPEG files with multiple DHT segments, and JPEG files where the tables have been combined. |
SOF0 |
start of frame (baseline DCT) |
0xFF 0xC0 |
Variable size. Typically 0x00 0x11 (17 bytes) for images with 3 components (e.g., YCrCb). |
Yes, but see "special notes". |
SOF0 can be replaced with SOF1 (0xFFC1, extended sequential DCT), SOF2 (0xFFC2, progressive DCT), etc... |
COM |
comment |
0xFF 0xFE |
Variable size. |
No |
|
SOS |
start of scan |
0xFF 0xDA |
Complicated. See below for details. |
Yes |
The compressed image data comes immediately after the SOS tag. |
EOI |
end of image |
0xFF 0xD9 |
This tag does not have a size. |
Yes |
This tag must be the last one in the image. |
SOI (start of image) - 0xFFD8
This simple tag indicates the start of an image. It appears at the very start of the file. It has no size, and is immediately followed by the APP0 tag.
xxd -c16 -g1 -u testimg.jpg | grep --color=always "FF D8"
00000000: FF D8 FF E0 00 10 4A 46 49 46 00 01 01 00 00 01 ......JFIF......
00000010: 00 01 00 00 FF DB 00 43 00 08 06 06 07 06 05 08 .......C........
...
APP0 (application data) - 0xFFE0
xxd -c16 -g1 -u testimg.jpg | grep --color=always "FF E0"
00000000: FF D8 FF E0 00 10 4A 46 49 46 00 01 01 00 00 01 ......JFIF......
00000010: 00 01 00 00 FF DB 00 43 00 08 06 06 07 06 05 08 .......C........
...
Description:
0xFF, 0xE0, // APP0 segment
0x00, 0x10, // size of segment, including these 2 bytes; 0x10 = 16 bytes
0x4A, 0x46, 0x49, 0x46, 0x00, // identifier string: "JFIF"
0x01, 0x01, // JFIF version 1.01
0x00, // density units (0=no units)
0x00, 0x01, // horizontal density
0x00, 0x01, // vertical density
0x00, // X thumbnail size
0x00 // Y thumbnail size
DQT (define quantization table) - 0xFFDB
xxd -c16 -g1 -u testimg.jpg | grep --color=always -C2 "FF DB"
00000000: FF D8 FF E0 00 10 4A 46 49 46 00 01 01 00 00 01 ......JFIF......
00000010: 00 01 00 00 FF DB 00 43 00 08 06 06 07 06 05 08 .......C........
00000020: 07 07 07 09 09 08 0A 0C 14 0D 0C 0B 0B 0C 19 12 ................
00000030: 13 0F 14 1D 1A 1F 1E 1D 1A 1C 1C 20 24 2E 27 20 ........... $.'
00000040: 22 2C 23 1C 1C 28 37 29 2C 30 31 34 34 34 1F 27 ",#..(7),01444.'
00000050: 39 3D 38 32 3C 2E 33 34 32 FF DB 00 43 01 09 09 9=82<.342...C...
00000060: 09 0C 0B 0C 18 0D 0D 18 32 21 1C 21 32 32 32 32 ........2!.!2222
Multiple quantization tables can be stored within a single DQT tag, or the JPEG file may have multiple DQT tags.
Description:
0xFF, 0xDB, // DQT segment
0x00, 0x43, // length of segment depends on the number of tables
0x00, // table #0, 8-bit
// followed by the 64 byte quantization table
0x08, 0x06, 0x06, 0x07, 0x06, 0x05, 0x08, 0x07, 0x07, 0x07, 0x09, 0x09, 0x08, 0x0A, 0x0C, 0x14,
0x0D, 0x0C, 0x0B, 0x0B, 0x0C, 0x19, 0x12, 0x13, 0x0F, 0x14, 0x1D, 0x1A, 0x1F, 0x1E, 0x1D, 0x1A,
0x1C, 0x1C, 0x20, 0x24, 0x2E, 0x27, 0x20, 0x22, 0x2C, 0x23, 0x1C, 0x1C, 0x28, 0x37, 0x29, 0x2C,
0x30, 0x31, 0x34, 0x34, 0x34, 0x1F, 0x27, 0x39, 0x3D, 0x38, 0x32, 0x3C, 0x2E, 0x33, 0x34, 0x32
DHT (define Huffman table) - 0xFFC4
xxd -c16 -g1 -u testimg.jpg | grep --color=always -C1 "FF C4"
000000b0: 01 FF C4 00 1F 00 00 01 05 01 01 01 01 01 01 00 ................
000000c0: 00 00 00 00 00 00 00 01 02 03 04 05 06 07 08 09 ................
000000d0: 0A 0B FF C4 00 B5 10 00 02 01 03 03 02 04 03 05 ................
000000e0: 05 04 04 00 00 01 7D 01 02 03 00 04 11 05 12 21 ......}........!
Multiple quantization tables can be stored within a single DHT tag, or the JPEG file may have multiple DHT tags.
Description:
0xFF, 0xC4, // DHT segment
0x00, 0xB5, // length of segment depends on the size of the table
0x10, // Huffman table
// next 16 bytes describes the number of table entries
// (in this example, the sum of 0+2+1+...+1+7d is 0xA2 or 162 decimal)
0x00, 0x02, 0x01, 0x03, 0x03, 0x02, 0x04, 0x03, 0x05, 0x05, 0x04, 0x04, 0x00, 0x00, 0x01, 0x7D,
// table starts here -- this example has 162 table entries
0x01, 0x02, 0x03, 0x00, 0x04, 0x11, 0x05, 0x12, 0x21, 0x31, 0x41, 0x06, 0x13, 0x51, 0x61, 0x07,
0x22, 0x71, 0x14, 0x32, 0x81, 0x91, 0xA1, 0x08, 0x23, 0x42, 0xB1, 0xC1, 0x15, 0x52, 0xD1, 0xF0,
...
SOF0 (start of frame) - 0xFFC0
xxd -c16 -g1 -u testimg.jpg | grep --color=always -C2 "FF C0"
0000090: 32 32 32 32 32 32 32 32 32 32 32 32 32 32 FF C0 22222222222222..
00000a0: 00 11 08 00 95 00 E3 03 01 22 00 02 11 01 03 11 ........."......
00000b0: 01 FF C4 00 1F 00 00 01 05 01 01 01 01 01 01 00 ................
Description:
0xFF, 0xC0, // SOF0 segement
0x00, 0x11, // length of segment depends on the number of components
0x08, // bits per pixel
0x00, 0x95, // image height
0x00, 0xE3, // image width
0x03, // number of components (should be 1 or 3)
0x01, 0x22, 0x00, // 0x01=Y component, 0x22=sampling factor, quantization table number
0x02, 0x11, 0x01, // 0x02=Cb component, ...
0x03, 0x11, 0x01 // 0x03=Cr component, ...
SOS (start of scan) - 0xFFDA
xxd -c16 -g1 -u testimg.jpg | grep --color=always -A4 "FF DA"
0000260: FA FF DA 00 0C 03 01 00 02 11 03 11 00 3F 00 F2 .............?..
0000270: E5 6A 76 FA 66 29 08 34 9A 1A 63 F7 F3 46 F1 51 .jv.f).4..c..F.Q
...
For a 3-component image, the size will be 0x000C (12 bytes). However, the actual compressed image data comes immediately after the SOS segment, and isn't accounted for in the SOS size. This is how JPEG files can be larger than the usual 64KiB segment size limitation. So when reading through the file looking for segments, the SOS must be treated differently.
To find the next segment after the SOS, you must keep reading until you find a 0xFF bytes which is not immediately followed by 0x00 (see "byte stuffing"). Normally, this will be the EOI segment that comes at the end of the file.
0xFF, 0xDA, // SOS segment
0x00, 0x0C, // length of segment depends on the number of components
0x03, // number of components (1=monochrome, 3=colour)
// 2 bytes for each component:
0x01, 0x00, // 0x01=Y, 0x00=Huffman table to use
0x02, 0x11, // 0x02=Cb, 0x11=Huffman table to use
0x03, 0x11, // 0x03=Cr, 0x11=Huffman table to use
// I never figured out the actual meaning of these next 3 bytes
0x00, // start of spectral selection or predictor selection
0x3F, // end of spectral selection
0x00, // successive approximation bit position or point transform
// image data starts here
0xF2, 0xE5, 0x6A, ...