Voice over IP Principles / Voice Transport |
Real-time Transport Protocol (RTP) is a network Protocol that defines a standard packet format for delivering audio and video over the Internet.
RTP protocol has RTP header followed by RTP data. The RTP header length is 12 bytes. Every RTP packet is guaranteed to contain initial 12 bytes header, but the RTP header length can be extended in case of RTP mixers or if RTP extended packets are used.
RTP packet time refers to the milliseconds of audio encapsulated into one RTP packet. An RTP packet time of 20 milliseconds means that 20 milliseconds of audio is encapsulated into one RTP packet. Thus, one second of audio would consist of 50 RTP packets (20 milliseconds/packet x 50 packets/second = 1000 milliseconds = 1 second).
The RTP header has the following format:
Header Name | Details |
---|---|
Version (V) (2 bits) | Indicates the RTP Protocol version number. The version value is 2. (The RFC 3350 defines RTP version number as 2). |
Padding (P) (1 bit) | Indicates that the RTP packet that contains one or more padding octets after the RTP payload. The last octet of the RTP packet indicates the padding octet length including itself. The RTP receiver must ignore the padding bytes. The padding may be required for encryption. |
Extension (X) (1 bit) | Indicates that there is an extension header present immediately after RTP header CSRC information. |
Marker (M) (1 bit) | Indicates the marker used by profile specific payload type when payload type contains packets that have frame based encodings such as H.263 and H.264 video payloads. |
Contributing Source Count (CC) (4 bits) | Indicates the number of contributing source identifiers present in the RTP packet. The CSRC identifiers that are present after the initial RTP 12-byte header. There can be a maximum of 15 contributing sources. |
Payload Type (PT) (7 bits) | Indicates the format of RTP payload type contained in this packet. The different payload types that can be sent over the RTP is mentioned in the RFC 3551. For u-law encoded Mono 8000 kHz, audio sample payload type is 0. For a-law encoded Mono 8000 kHz, audio sample payload type is 8. |
Sequence Number (16 bits) | Indicates the sequence number. The sequence number is increment by one for each of the RTP packets. The initial sequence number can be a random number to make known plain text attack difficult. |
Timestamp (32 bits) | Indicates the sampling instant of the first octet present in RTP data
packet. If 8 kHz audio data is read every 20 ms, then the timestamp is
incremented by 160 (8 * 20 = 160) for each of the RTP packets. The RTP
timestamp is incremented based on the sampling period. If 8 kHz audio
sample is read every 18 ms then the timestamp will be incremented by 144
(8*18) for each of the RTP packets. If 8 kHz audio sample is read every 30 ms then the timestamp will be incremented by 240 (8*30) for each of the RTP packets. |
Synchronization Source (SSRC) (32 bits) | Indicates the 32-bit identifier to identify the RTP packets, that is a randomly generated unique participating entity. |
Contributing Source List (CSRC) (32 bit) | Indicates the IDs of the contributing source, present only if the CC field in the RTP header is set. The list consists of 0 to 15 items. |