Introduction to H.264: (1) NAL Unit

H.264 / MPEG-4 Part 10, Advanced Video Coding (MPEG-4 AVC) is a common video compression format developed by ITU-T Video Coding Experts Group (VCEG) and ISO/IEC JTC1 Moving Picture Experts Group (MPEG). Network Abstraction Layer (NAL) and Video Coding Layer (VCL) are the two main concepts in H.264. A H.264 file consists of a number of NAL units (NALU) and each NALU can be classified as VCL or non-VCL. Video data is processed by the codec and packed into NAL units.

NALU in Packet-Transport Protocol V.S. Byte-Stream Format

There are two ways to pack a NAL unit for different systems, Packet-Transport System and Byte-Stream Format. For Packet-Transport systems like RTP, the transport system protocol frames the coded data into different pieces. Hence, the system can easily identify the boundaries of NAL units and we don’t need to add extra start code, which is a waste of resources. This method is usually used in streaming.

However, in other systems, there is no such protocol to separate NAL units. For example, you want to store a H.264 file and decode it on another computer. The decoder has no idea on how to search the boundaries of the NAL units. So, a three-byte or four-byte start code, 0x000001 or 0x00000001, is added at the beginning of each NAL unit. They are called Byte-Stream Format. Hence, the decoder can now identify the boundaries easily. According to ITU-T Recommendation, this format is usaully used in Rec. ITU-T H.222.0 | ISO/IEC 13818-1 systems or Rec. ITU-T H.320 systems.

In this series of articles, I will use byte-stream format as an example to introduce H.264. Here is a sample of a H.264 byte-stream. You can see several start codes 0x00000001 here, which means there are several NAL units in this image.

A sample of raw H.264 in byte stream format

Figure 1: A sample of raw H.264 in byte stream format

But there are chances that 0x000001 or 0x00000001 exists in the bitstream of a NAL unit. So a emulation prevention bytes, 0x03, is presented when there is 0x000000, 0x000001, 0x000002 and 0x000003 to make them become 0x00000300, 0x00000301, 0x00000302 and 0x00000303 respectively. This ensures that no sequence of consecutive byte-aligned bytes in the NAL unit contains a start code prefix. Line 1 and 2 in the image demonstrate two examples. You may also read the next article for details.

First Byte of NALU: NAL Unit Header

In a NALU, the first byte is a header byte indicating the type of data contained in it and other information. The rest of bytes are the payload of a NAL unit. For the header byte, it can be parsed into 3 parts as shown. Let’s take 0x67 on Line 1 of Figure 1 as an example.

Hex Binary
0x67 0110 0111

Figure 2: Parse the First Byte of a NAL unit

(Note: If you don’t understand the descriptor, read this article: Explanation of Descriptors in the ITU-T Publication on H.264 Coding Standard/Recommendation (with example) first)

So, for 0x67, we have:
forbidden_zero_bit = 0,
nal_ref_idc = 3,
nal_unit_type = 7

The 1st bit is forbidden_zero_bit which is used to check whether there is any error occurred during the transmission. 0 means that it is normal while 1 indicates a syntax violation. So, we should always find that forbidden_zero_bit equals 0.

The next 2 bits are nal_ref_idc indicating whether this NAL unit is a reference field / frame / picture. On one hand, if it is a reference field / frame / picture, nal_ref_idc is not equal to 0. According to the Recommendation, non-zero nal_ref_idc specifies that the content of the NAL unit contains a sequence parameter set (SPS), a SPS extension, a subset SPS, a picture parameter set (PPS), a slice of a reference picture, a slice data partition of a reference picture, or a prefix NAL unit preceding a slice of a reference picture. On the other hand, if it is a non-reference field / frame / picture, nal_ref_idc is equal to 0. For any non-zero value, the larger the value, the more the importance of the NAL unit. In this case, it is equal to 3 (0x11) and is a reference field / frame / picture (In fact, it’s a SPS. I’ll tell you more later).

The following 5 bits specify the nal_unit_type. It specifies the type of RBSP data structure contained in the NAL unit as specified in Table 7-1. VCL NAL units are specified as those NAL units having nal_unit_type equal to 1 to 5, inclusive. All remaining NAL units are called non-VCL NAL units.

More about NAL Unit Type

The following table lists all NAL unit types and their properties. (Normally, we follow Annex A.)

Table 7-1 – NAL unit type codes, syntax element categories, and NAL unit type classes

nal_unit_type Content of NAL unit &
RBSP syntax structure
C NAL unit type class
[Annex A]
NAL unit type class
[Annex G & H]
NAL unit type class
[Annex I]
0 Unspecified non-VCL non-VCL non-VCL
1 Coded slice of a non-IDR picture
slice_layer_without_partitioning_rbsp( )
2, 3, 4 VCL VCL VCL
2 Coded slice data partition A
slice_data_partition_a_layer_rbsp( )
2 VCL not applicable not applicable
3 Coded slice data partition B
slice_data_partition_b_layer_rbsp( )
3 VCL not applicable not applicable
4 Coded slice data partition C
slice_data_partition_c_layer_rbsp( )
4 VCL not applicable not applicable
5 Coded slice of an IDR picture
slice_layer_without_partitioning_rbsp( )
2, 3 VCL VCL VCL
6 Supplemental enhancement information (SEI)
sei_rbsp( )
5 non-VCL non-VCL non-VCL
7 Sequence parameter set
seq_parameter_set_rbsp( )
0 non-VCL non-VCL non-VCL
8 Picture parameter set
pic_parameter_set_rbsp( )
1 non-VCL non-VCL non-VCL
9 Access unit delimiter
access_unit_delimiter_rbsp( )
6 non-VCL non-VCL non-VCL
10 End of sequence
end_of_seq_rbsp( )
7 non-VCL non-VCL non-VCL
11 End of stream
end_of_stream_rbsp( )
8 non-VCL non-VCL non-VCL
12 Filler data
filler_data_rbsp( )
9 non-VCL non-VCL non-VCL
13 Sequence parameter set extension
seq_parameter_set_extension_rbsp( )
10 non-VCL non-VCL non-VCL
14 Prefix NAL unit
prefix_nal_unit_rbsp( )
2 non-VCL suffix dependent suffix dependent
15 Subset sequence parameter set
subset_seq_parameter_set_rbsp( )
0 non-VCL non-VCL non-VCL
16 – 18 Reserved non-VCL non-VCL non-VCL
19 Coded slice of an auxiliary coded picture without partitioning
slice_layer_without_partitioning_rbsp( )
2, 3, 4 non-VCL non-VCL non-VCL
20 Coded slice extension
slice_layer_extension_rbsp( )
2, 3, 4 non-VCL VCL VCL
21 Coded slice extension for depth view components
slice_layer_extension_rbsp( )
(specified in Annex I)
2, 3, 4 non-VCL non-VCL VCL
22 – 23 Reserved non-VCL non-VCL VCL
24 – 31 Unspecified non-VCL non-VCL non-VCL

The column marked “C” lists the categories of the syntax elements that may be present in the NAL unit. In addition, syntax elements with syntax category “All” may be present, as determined by the syntax and semantics of the RBSP data structure. The presence or absence of any syntax elements of a particular listed category is determined from the syntax and semantics of the associated RBSP data structure. nal_unit_type shall not be equal to 3 or 4 unless at least one syntax element is present in the RBSP data structure having a syntax element category value equal to the value of nal_unit_type and not categorized as “All”.

The entry “suffix dependent” for nal_unit_type equal to 14 is specified as follows:
–  If the NAL unit directly following in decoding order is a NAL unit with type 1 or 5, it is a VCL NAL unit.
–  Otherwise (the NAL unit directly following in decoding order is a NAL unit with type not equal to 1 or 5), it is a non-VCL NAL unit.
Decoders shall ignore (remove from the bitstream and discard) the NAL unit with type 14 and the NAL unit directly following (in decoding order) the NAL unit with type 14.

To pack into byte-stream format:

byte_stream_nal_unit( NumBytesInNALunit ) {               // Descriptor 
	while(next_bits(24) != 0x000001 &&
		next_bits(32) != 0x00000001)
			leading_zero_8bits /* equal to 0x00 */        // f(8)

	if(next_bits(24) != 0x000001)
		zero_byte /* equal to 0x00 */                     // f(8)
		
	start_code_prefix_one_3bytes /* equal to 0x000001 */  // f(24)
	nal_unit( NumBytesInNALunit )   
	while(more_data_in_byte_stream() &&
		next_bits(24) != 0x000001 &&
		next_bits(32) != 0x00000001)
			trailing_zero_8bits /* equal to 0x00 */       // f(8)
}

Relationship betweem nal_ref_idc and NAL Unit Type

Nal Unit Type Possible nal_ref_idc value
1 – 4 If one of the NALU is 0, all NAL units with Type in the tang of 1 – 4, inclusive, of the picture are 0
5 Coded slice of an IDR picture non-zero
7 Sequence parameter set non-zero
8 Picture parameter set non-zero
13 Sequence parameter set extension non-zero
15 Subset sequence parameter set non-zero
6, 9, 10, 11 or 12 0

Comments

comments

25 Comments

  1. gaojie

    In Table 7-1, How understand C= 6,7,8,9,10

    Reply
    1. Yumi Chan (Post author)

      C is the categories of the syntax elements.

      Reply
  2. gaojie

    0x67 should be 0110 0111

    Reply
    1. Yumi Chan (Post author)

      Thank you and sorry for the typo!

      Reply
  3. Me

    If you’re going to change to 6E, you have to update this:

    So, for 0x6E, we have:
    forbidden_zero_bit = 0,
    nal_ref_idc = 3,
    nal_unit_type = 7

    Thanks!

    Reply
    1. Yumi Chan (Post author)

      Hi, sorry for the mistake again. I change it back to 0x67. Please let me know whether there is any other mistake. Hope you aren’t confused.

      Reply
  4. Sambangi Satishkumar

    how to generate a NAL byte stream for decoding?
    I am working on h.265 video decoder to implement it with verilog HDL. The input to the cabac decoder is byte stream so how i can get the byte stream? please help me.

    Reply
    1. Yumi Chan (Post author)

      Sorry for late reply!
      You may convert with some software like ffmpeg.
      Some media-supported chips also provide byte stream directly.

      Reply
  5. Alvin

    In hexstream format, is there parameter which define RGB of the video frame? Is there represented after the entry_point_offset_minus1 parameter?

    Thank you

    Reply
    1. Yumi Chan (Post author)

      I am sorry. I have no idea about hexstream format…

      Reply
    2. Alexis Wilke

      Compression for video streams doesn’t use RGB. In most cases we use a form of YUV. There are many YUV formats, though, like 4:0:0 and 4:2:2, for example YIQ was used in the US https://en.wikipedia.org/wiki/YIQ

      Reply
  6. Pingback: Breif Description of nal_ref_idc Value in H.246 NALU | Yumi Chan's Blog

  7. Pingback: Breif Description of nal_ref_idc Value in H.246 NALU | Yumi Chan's Blog

  8. ChastitySmall

    I have checked your blog and i’ve found some duplicate content, that’s
    why you don’t rank high in google, but there is a
    tool that can help you to create 100% unique articles,
    search for; Boorfe’s tips unlimited content

    Reply
  9. MichalChief

    I have checked your blog and i have found some duplicate content, that’s why
    you don’t rank high in google, but there is a tool that can help you to create 100%
    unique articles, search for: boorfe’s tips unlimited content

    Reply
  10. BestZulma

    I have noticed you don’t monetize your page, don’t waste your traffic, you can earn extra bucks every month because you’ve
    got hi quality content. If you want to know how to
    make extra money, search for: Ercannou’s essential adsense alternative

    Reply
  11. BestLaurie

    I have noticed you don’t monetize your site, don’t waste your traffic, you can earn extra cash every month.
    You can use the best adsense alternative for any type of
    website (they approve all websites), for more info simply search in gooogle:
    boorfe’s tips monetize your website

    Reply
  12. Xon

    Why 0x000002 is prevented? It cannot emulate a start code.

    Reply
  13. Pingback: What’s new in Membrane Framework - Q1 2019 – Membrane Framework

  14. 8XX1YKZ www.yandex.ru
  15. lc

    ” a three-byte or four-byte start code, 0x000001 or 0x00000001″
    what are the difference between 0x000001 (3Bytes) and 0x00000001(4Bytes)?

    I heard that 0x00000001(4Bytes)meaning:
    a slice contain a full frame。
    0x000001 (3Bytes) meaning:
    a full frame is consist of some slices。

    I don’t find different points of them in h264 specification。

    Reply
  16. Adriene

    Good post. I will be experiencing some of these issues as well..

    Reply
  17. chandra

    You blog are awesome are more informative

    But I am confused when i analysis source code of node media server parsing rtmp video data payload
    frame_type = (payload[0]>>4)&0x0f;
    codec_id = payload[0]&0x0f;

    Here not using NALU concept why

    I please reply

    Reply
    1. Yumi Chan (Post author)

      Hi, thanks for your appreciation but I have switched my focus to other areas in recent years. So.. I am afraid I cannot help.

      Reply
  18. Massimo Perrone

    Hi all,
    is there a way to check if a given VCL nalu (type 5 for example, Coded slice of an IDR picture) references a full image rather than just a slice?
    Thanks

    Reply

Leave a Reply to Me Cancel reply

Your email address will not be published. Required fields are marked *