A JPEG file is a collection of markers which define blocks of data in the file. See Wikipedia to get an overview of the file format.
This vignette simply parses out all the headers and prints a description of each one.
There is an included table of jpeg_markers
in this
vignette, but only the first 15 rows are displayed here.
This data consists of:
Hex
the hexadecimal code that is the makerMarker
the codename for this markerName
/Description
more information about
this markerHex | Marker | Name | Description |
---|---|---|---|
ffc0 | SOF0 | Start of Frame 0 | Baseline DCT |
ffc1 | SOF1 | Start of Frame 1 | Extended Sequential DCT |
ffc2 | SOF2 | Start of Frame 2 | Progressive DCT |
ffc3 | SOF3 | Start of Frame 3 | Lossless (sequential) |
ffc4 | DHT | Define Huffman Table | NA |
ffc5 | SOF5 | Start of Frame 5 | Differential sequential DCT |
ffc6 | SOF6 | Start of Frame 6 | Differential progressive DCT |
ffc7 | SOF7 | Start of Frame 7 | Differential lossless (sequential) |
ffc8 | JPG | JPEG Extensions | NA |
ffc9 | SOF9 | Start of Frame 9 | Extended sequential DCT, Arithmetic coding |
ffca | SOF10 | Start of Frame 10 | Progressive DCT, Arithmetic coding |
ffcb | SOF11 | Start of Frame 11 | Lossless (sequential), Arithmetic coding |
ffcc | DAC | Define Arithmetic Coding | NA |
ffcd | SOF13 | Start of Frame 13 | Differential sequential DCT, Arithmetic coding |
ffce | SOF14 | Start of Frame 14 | Differential progressive DCT, Arithmetic coding |
#> [1] 76 100 3
#> [1] ff d8 ff e0 00 10 4a 46 49 46 00 01 01 01 01 2c 01 2c 00 00 ff e1 00 80 45
#> [26] 78 69 66 00 00 4d 4d 00 2a 00 00 00 08 00 05 01 12 00 03 00 00 00 01 00 01
#> [51] 00 00 01 1a 00 05 00 00 00 01 00 00 00 4a 01 1b 00 05 00 00 00 01 00 00 00
#> [76] 52 01 28 00 03 00 00 00 01 00 02 00 00 87 69 00 04 00 00 00 01 00 00 00 5a
A JPEG file is just a sequence of chunks. A chunk consists of
ffe0
The following code first asserts that the JPEG file starts with the
“Start of Image” marker ffd8
, then:
ffda
) is found.#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Open a connection, and tag the connection such that
# values are read in **big endian** by default.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
con <- file(jpeg_file, 'rb') |>
set_endian('big')
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Read the first 2 bytes as HEX
# For regular JPEG files, this should be the "Start of Image (SOI)" marker
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
soi <- read_hex(con, n = 1, size = 2) # ffd8: SOI
stopifnot(soi == 'ffd8')
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Keep reading markers and the chunk data until we reach
# the 'Start of Scan' marker (ffda)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
marker <- read_hex(con, n = 1, size = 2)
while(length(marker) > 0 && nchar(marker) > 0) {
# The relevant row from the 'jpeg_markers' data.frame
info <- subset(jpeg_markers, jpeg_markers$Hex == marker)
# It's possible there may be custom markers which aren't included
# in my list of markers
if (nrow(info) == 0) {
cat("Unknown marker: ", marker, "\n")
marker <- read_hex(con, n = 1, size = 2)
next
}
# Read the length of data in this chunk and output the chunk info
len <- read_uint16(con)
msg <- sprintf("%s [%5i] [%s] [%s] [%s]\n", marker, len, info$Marker, info$Name, info$Description)
cat(msg)
# Check if we've reached the Start of Scan marker
if (marker == 'ffda') {
cat("Compressed image data until end of file\n")
break
}
# Read the chunk data
# In JPEG the length of each chunk includes the 2 bytes which specify
# the chunk length, so read len-2 bytes from the current position
chunk_data <- read_uint8(con, n = len - 2)
# Process chunk data here
# Read the next marker and continue
marker <- read_hex(con, n = 1, size = 2)
}
#> ffe0 [ 16] [APP0] [Application Segment 0] [JFIF – JFIF JPEG image AVI1 – Motion JPEG (MJPG)]
#> ffe1 [ 128] [APP1] [Application Segment 1] [EXIF Metadata, TIFF IFD format, JPEG Thumbnail (160×120) Adobe XMP]
#> ffdb [ 67] [DQT] [Define Quantization Table] [NA]
#> ffdb [ 67] [DQT] [Define Quantization Table] [NA]
#> ffc0 [ 17] [SOF0] [Start of Frame 0] [Baseline DCT]
#> ffc4 [ 31] [DHT] [Define Huffman Table] [NA]
#> ffc4 [ 181] [DHT] [Define Huffman Table] [NA]
#> ffc4 [ 31] [DHT] [Define Huffman Table] [NA]
#> ffc4 [ 181] [DHT] [Define Huffman Table] [NA]
#> ffda [ 12] [SOS] [Start of Scan] [NA]
#> Compressed image data until end of file