# Explanation of Descriptors in the ITU-T Publication on H.264 Coding Standard / Recommendation (with example)

[:en]This text will be displayed in English[/:en]
[:zh]暫無中文版[/:zh]

I have started to learn H.264 for a week. My job required me to decode some H.264 video clips streamed from a media processor Hi3518 manufactured by Hisilicon. Yet, I have no background in video coding and it is a hard time for me now. I will keep writing and organizing the related materials for my revision and also for your reference. In this section, I am going to explain the descriptors used in the ITU-T official Publication on H.264 Coding Recommendation: Advanced video coding for generic audiovisual services and illustrate with some example. It is the basic thing you need to understand first before you could understand how to decode a H.264 video clip. There are 10 main descriptors, which are as follows (extracted from the document, edition 2013):

The following descriptors specify the parsing process of each syntax element. For some syntax elements, two descriptors, separated by a vertical bar, are used. In these cases, the left descriptors apply when entropy_coding_mode_flag is equal to 0 and the right descriptor applies when entropy_coding_mode_flag is equal to 1.

ae(v): context-adaptive arithmetic entropy-coded syntax element. The parsing process for this descriptor is specified in clause 9.3.

b(8): byte having any pattern of bit string (8 bits). The parsing process for this descriptor is specified by the return value of the function read_bits( 8 ).

ce(v): context-adaptive variable-length entropy-coded syntax element with the left bit first. The parsing process for this descriptor is specified in clause 9.2.

f(n): fixed-pattern bit string using n bits written (from left to right) with the left bit first. The parsing process for this descriptor is specified by the return value of the function read_bits( n ).

i(n): signed integer using n bits. When n is “v” in the syntax table, the number of bits varies in a manner dependent on the value of other syntax elements. The parsing process for this descriptor is specified by the return value of the function read_bits( n ) interpreted as a two’s complement integer representation with most significant bit written first.

me(v): mapped Exp-Golomb-coded syntax element with the left bit first. The parsing process for this descriptor is specified in clause 9.1.

se(v): signed integer Exp-Golomb-coded syntax element with the left bit first. The parsing process for this descriptor is specified in clause 9.1.

te(v): truncated Exp-Golomb-coded syntax element with left bit first. The parsing process for this descriptor is specified in clause 9.1.

u(n): unsigned integer using n bits. When n is “v” in the syntax table, the number of bits varies in a manner dependent on the value of other syntax elements. The parsing process for this descriptor is specified by the return value of the function read_bits( n ) interpreted as a binary representation of an unsigned integer with most significant bit written first.

ue(v): unsigned integer Exp-Golomb-coded syntax element with the left bit first. The parsing process for this descriptor is specified in clause 9.1.

### 1. b(8), f(n)

These two are more easy to be understood and I’m going to discuss first. To begin with, you need to know the meaning of read_bits(n) [1], where n is number of bits. As the identifier described, it is used for read some bits of a bit string and n, the number of bits, states how many of the following bits it is going to read. Both b(8) and f(n) are also based on read_bits(n). For b(8), it’s actually same as read_bits(8), which reads the following 8 bits (1 byte). For f(n), it’s exactly the same as read_bits(n). For example, we have a bit string of 0x197E5A01:

 Hex 0x197E5A01 Binary form 0001 1001 0111 1110 0101 1010 0000 0001
 Syntax Descriptor syntax_1 b(8) syntax_2 f(8) syntax_3 f(1) syntax_4 f(5) syntax_5 b(8) syntax_6 f(2)

Then for syntax_1, b(8) returns the first byte, 0001 1001. For syntax_2, f(8) returns the second byte, 0111 1110. For syntax_3, f(1) returns the value of next bit which is 0. Continue with the example, we have:
syntax_1 = 25 (0001 1001)
syntax_2 = 126 (0111 1110)
syntax_3 = 0 (0)
syntax_4 = 22 (10110)
syntax_5 = 128 (1000 0000)
syntax_6 = 1 (01)

[1] description of read_bits(n) from the document

read_bits( n ) reads the next n bits from the bitstream and advances the bitstream pointer by n bit positions. When n is equal to 0, read_bits( n ) is specified to return a value equal to 0 and to not advance the bitstream pointer.

### 2. i(n), u(n)

i(n) and u(n) are also based on read_bits(n) too. But they usually appear as i(v) or u(v), especially i(v), on the syntax table in the document. So, what does “v” mean? “v” means variable so the length to be parsed is variable. It may depends on some syntax elements obtained previously during the parsing process. After you get the exact value of v, you can parse by read_bits(v). The difference between i(n) and u(n) is that i(n) regards the binary number as a signed integer (2’s complement) while u(n) regards it as a unsigned integer.

Example for u(v):
The pair of syntax elements log2_max_frame_num_minus4 [ue(v)] obtained in seq_parameter_set_data( ) and frame_num [u(v)] obtained in slice_header( ) is a good example, where frame_num = log2_max_frame_num_minus4 + 4.
Suppose log2_max_frame_num_minus4 = 4 (refer to next section about parsing ue(v)), then frame_num = 4 + 4 = 8. So, to get the value of frame_num, you can parse the bitstream with u(8) = b(8) = f(8) = read_bits( 8 ).

i(v) is just signed.

### 3. ue(v), te(v), se(v), me(v)

These are all Exp-Golomb coded. To decode, you need a codeNum which can calculated by:

The parsing process for these syntax elements begins with reading the bits starting at the current location in the bitstream up to and including the first non-zero bit, and counting the number of leading bits that are equal to 0. This process is specified as follows:

leadingZeroBits = −1
for( b = 0; !b; leadingZeroBits++ )
b = read_bits( 1 )

The variable codeNum is then assigned as follows:
codeNum = $2^{leadingZeroBits}$ − 1 + read_bits( leadingZeroBits )

codeNum is very important. Lemme illustrate with some example.
Suppose we have a bitstream: 1000 1101 0000 0011 1001 0100
If we apply ue(v), since there is no leading zero, so leadingZeroBits = 0 and it only reads the 1st bit (in the for-loop).
1000 1101 0000 0011 1001 0100…
codeNum = $2^{0}$ − 1 + read_bits( 0 ) = 0

Apply ue(v) again, leadingZeroBits = 3 [1000 1101 0000 0011 1001 0100… ]; again, that “1” is actually read by read_bits( 1 ) in the for-loop. Then,
codeNum = $2^{3}$ − 1 + read_bits( 3 ) = 8 – 1 + 5 = 12 [1000 1101 0000 0011 1001 0100…]

Continue… leadingZeroBits = 6 [1000 1101 0000 0011 1001 0100…]. Then,
codeNum = $2^{6}$ − 1 + read_bits( 6 ) = 64 – 1 + 50 = 113 [1000 1101 0000 0011 1001 0100…]

To obtain the final value of the syntax, some more rules are applied.
ue(v): It is equal to codeNum.
te(v): Suppose x is the largest possible value of for the syntax element, s, then s ∈ [0, x] and s is an integer.
– If x > 1, it is same as ue(v).
– If x = 1, the parsing process is given by a process equivalent to:

b = read_bits(1) codeNum = !b

se(v): It is signed Exp-Golomb coding with applying $(-1)^{codeNum + 1}\cdot Ceil( codeNum \div 2 )$. For example,

codeNum = 0 >>> $(-1)^{0 + 1}\cdot Ceil( 0 \div 2 )$ >>> value = 0
codeNum = 1 >>> $(-1)^{1 + 1}\cdot Ceil( 1 \div 2 )$ >>> value = 1
codeNum = 2 >>> $(-1)^{2 + 1}\cdot Ceil( 2 \div 2 )$ >>> value = -1
codeNum = 3 >>> $(-1)^{3 + 1}\cdot Ceil( 3 \div 2 )$ >>> value = 2
codeNum = 4 >>> $(-1)^{4 + 1}\cdot Ceil( 4 \div 2 )$ >>> value = -2

me(v): map codeNum to the tables in clause 9.1.2 (p.209). The tables are too long so I don’t paste them here.

### 4. ae(v), ce(v)

Haven’t study yet…. but you can refer to clause 9.2 and 9.3.

### Other Notes

Be careful that you need to use left descriptor or right descriptor (the red line in the beginning)