# Explanation of Descriptors in the ITU-T Publication on H.264 Coding Standard / Recommendation (with example)

This text will be displayed in English

I have started to learn H.264 for a week. My job required me to decode some H.264 video clips streamed from a media processor Hi3518 manufactured by Hisilicon. Yet, I have no background in video coding and it is a hard time for me now. I will keep writing and organizing the related materials for my revision and also for your reference. In this section, I am going to explain the descriptors used in the ITU-T official Publication on H.264 Coding Recommendation: * Advanced video coding for generic audiovisual services* and illustrate with some example. It is the basic thing you need to understand first before you could understand how to decode a H.264 video clip. There are 10 main descriptors, which are as follows (extracted from the document, edition 2013):

The following descriptors specify the parsing process of each syntax element. For some syntax elements, two descriptors, separated by a vertical bar, are used. In these cases, the left descriptors apply when entropy_coding_mode_flag is equal to 0 and the right descriptor applies when entropy_coding_mode_flag is equal to 1.

–

ae(v): context-adaptive arithmetic entropy-coded syntax element. The parsing process for this descriptor is specified in clause 9.3.–

b(8): byte having any pattern of bit string (8 bits). The parsing process for this descriptor is specified by the return value of the function read_bits( 8 ).–

ce(v): context-adaptive variable-length entropy-coded syntax element with the left bit first. The parsing process for this descriptor is specified in clause 9.2.–

f(n): fixed-pattern bit string using n bits written (from left to right) with the left bit first. The parsing process for this descriptor is specified by the return value of the function read_bits( n ).–

i(n): signed integer using n bits. When n is “v” in the syntax table, the number of bits varies in a manner dependent on the value of other syntax elements. The parsing process for this descriptor is specified by the return value of the function read_bits( n ) interpreted as a two’s complement integer representation with most significant bit written first.–

me(v): mapped Exp-Golomb-coded syntax element with the left bit first. The parsing process for this descriptor is specified in clause 9.1.–

se(v): signed integer Exp-Golomb-coded syntax element with the left bit first. The parsing process for this descriptor is specified in clause 9.1.–

te(v): truncated Exp-Golomb-coded syntax element with left bit first. The parsing process for this descriptor is specified in clause 9.1.–

u(n): unsigned integer using n bits. When n is “v” in the syntax table, the number of bits varies in a manner dependent on the value of other syntax elements. The parsing process for this descriptor is specified by the return value of the function read_bits( n ) interpreted as a binary representation of an unsigned integer with most significant bit written first.–

ue(v): unsigned integer Exp-Golomb-coded syntax element with the left bit first. The parsing process for this descriptor is specified in clause 9.1.

### 1. b(8), f(n)

These two are more easy to be understood and I’m going to discuss first. To begin with, you need to know the meaning of read_bits(n) [1], where n is number of bits. As the identifier described, it is used for read some bits of a bit string and n, the number of bits, states how many of the following bits it is going to read. Both b(8) and f(n) are also based on read_bits(n). For **b(8)**, it’s actually same as read_bits(8), which reads the following 8 bits (1 byte). For **f(n)**, it’s exactly the same as read_bits(n). For example, we have a bit string of 0x197E5A01:

Hex |
0x197E5A01 |

Binary form |
0001 1001 0111 1110 0101 1010 0000 0001 |

Syntax |
Descriptor |

syntax_1 | b(8) |

syntax_2 | f(8) |

syntax_3 | f(1) |

syntax_4 | f(5) |

syntax_5 | b(8) |

syntax_6 | f(2) |

Then for syntax_1, b(8) returns the first byte, 0001 1001. For syntax_2, f(8) returns the second byte, 0111 1110. For syntax_3, f(1) returns the value of next bit which is 0. Continue with the example, we have:

syntax_1 = 25 (0001 1001)

syntax_2 = 126 (0111 1110)

syntax_3 = 0 (0)

syntax_4 = 22 (10110)

syntax_5 = 128 (1000 0000)

syntax_6 = 1 (01)

read_bits( n ) reads the next n bits from the bitstream and advances the bitstream pointer by n bit positions. When n is equal to 0, read_bits( n ) is specified to return a value equal to 0 and to not advance the bitstream pointer.

### 2. i(n), u(n)

**i(n)** and **u(n)** are also based on read_bits(n) too. But they usually appear as i(v) or u(v), especially i(v), on the syntax table in the document. So, what does **“v”** mean? “v” means variable so the length to be parsed is variable. It may depends on some syntax elements obtained previously during the parsing process. After you get the exact value of v, you can parse by read_bits(v). The difference between **i(n)** and **u(n)** is that **i(n)** regards the binary number as a **signed** integer (**2’s complement**) while **u(n)** regards it as a **unsigned** integer.

*Example for u(v)*:

The pair of syntax elements **log2_max_frame_num_minus4** [ue(v)] obtained in seq_parameter_set_data( ) and **frame_num** [u(v)] obtained in slice_header( ) is a good example, where **frame_num** = **log2_max_frame_num_minus4** + 4.

Suppose **log2_max_frame_num_minus4 **= 4 (refer to next section about parsing ue(v)), then **frame_num** = 4 + 4 = 8. So, to get the value of **frame_num**, you can parse the bitstream with `u(8) = b(8) = f(8) = read_bits( 8 )`

.

i(v) is just signed.

### 3. ue(v), te(v), se(v), me(v)

These are all Exp-Golomb coded. To decode, you need a codeNum which can calculated by:

The parsing process for these syntax elements begins with reading the bits starting at the current location in the bitstream up to and including the first non-zero bit, and counting the number of leading bits that are equal to 0. This process is specified as follows:

123 leadingZeroBits = −1for( b = 0; !b; leadingZeroBits++ )b = read_bits( 1 )The variable codeNum is then assigned as follows:

codeNum = − 1 + read_bits( leadingZeroBits )

codeNum is very important. Lemme illustrate with some example.

Suppose we have a bitstream: **1000 1101 0000 0011 1001 0100**…

If we apply ue(v), since there is no leading zero, so leadingZeroBits = 0 and it only reads the 1st bit (in the for-loop).

**1**000 1101 0000 0011 1001 0100…

codeNum = − 1 + read_bits( 0 ) = 0

Apply ue(v) again, leadingZeroBits = 3 [1**000 1**101 0000 0011 1001 0100… ]; again, that “1” is actually read by read_bits( 1 ) in the for-loop. Then,

codeNum = − 1 + **read_bits( 3 )** = 8 – 1 + **5** = 12 [1000 1**101** 0000 0011 1001 0100…]

Continue… leadingZeroBits = 6 [1000 1101 **0000 001**1 1001 0100…]. Then,

codeNum = − 1 + **read_bits( 6 )** = 64 – 1 + **50** = 113 [1000 1101 0000 001**1 1001 0**100…]

To obtain the final value of the syntax, some more rules are applied.

**ue(v)**: It is equal to codeNum.

**te(v)**: Suppose x is the largest possible value of for the syntax element, s, then `s ∈ [0, x]`

and s is an integer.

– If x > 1, it is same as ue(v).

– If x = 1, the parsing process is given by a process equivalent to:

`b = read_bits(1)`

codeNum = !b

**se(v)**: It is signed Exp-Golomb coding with applying `. For example,`

codeNum = 0 >>> >>> value = 0

codeNum = 1 >>> >>> value = 1

codeNum = 2 >>> >>> value = -1

codeNum = 3 >>> >>> value = 2

codeNum = 4 >>> >>> value = -2

**me(v)**: map codeNum to the tables in clause 9.1.2 (p.209). The tables are too long so I don’t paste them here.

### 4. ae(v), ce(v)

Haven’t study yet…. but you can refer to clause 9.2 and 9.3.

### Other Notes

Be careful that you need to use left descriptor or right descriptor (the red line in the beginning)