How is UDP used in multimedia system inspite of being connectionless

Question

We know that UDP does not care about segment (packet) in-order and their arrival on the destination.Then how on YouTube or VoIP (skype) are we able to stream videos?

If a packets is lost (doesn't reach the destination) , destination has no idea about it. so, it can't ask the target to retransmit it. And due to connectionlessness of UDP not necessarily all the segments reaches the destination in-order.

If a segment arrives out of the order, shouldn't the video being streamed be distorted? since the whole segment order is wrong now, because of that one segment is not on its place or packets reaches out of order.

This is ore an application layer question, i.e. off-topic here. But in short: Youtube and VoIP are very different things. Youtube uses HTTP, VoIP uses RTP. Only the latter is relevant for your question. The answer lies in the protocol and the codecs used: the codecs can deal with lost packets, i.e. no retransmit is needed. And the RTP protocol has timestamps and sequence numbers so the receiver can detect duplicates, reordering deferred or lost packets. — Steffen Ullrich, Sep 09 '21 at 11:33
i believe this question falls under the category "design or theory of protocols used to operate a network (e.g. IP, TCP, routing protocols, STP, etc);" transmitting multimedia over IP networks is not that trivial and there are basic principles. — Effie, Sep 09 '21 at 11:42
@SteffenUllrich "VoIP uses RTP" - there are a variety of VoIP protocols around. — Zac67, Sep 09 '21 at 11:44
Real-time applications cannot use retransmission as the retransmitted data would be received after it is useful. Receiving data after its useful life is wasteful of the bandwidth, and it can cause delays or lost packets in the still-useful packets on the wire. — Ron Maupin, Sep 09 '21 at 12:58
Consider that just like UDP, IP itself also provides no way to know if packets are lost or reordered. If it's impossible to to detect those situations when using UDP, it should also be impossible when using IP. And yet somehow, TCP can do it, even though it also uses IP. — ilkkachu, Sep 09 '21 at 19:40

Effie · Answer 1 · 2021-09-11T10:13:07.710

Short answer: Multimedia implements protocols on top of UDP. They implement the required functionality. Actually, they have more functionality, and are more complex than TCP.

Less short answer: see the answer of @Zac67.

Long answer: I hope that the text below will provide you with a better understanding on this issue.

Transmitting Multimedia over IP

First, we need to differentiate between different types of multimedia. It is important whether there is a pre-recorder video stream (e.g., YouTube) or a live interactive conversation (e.g., skype). I call them real-time multimedia. There is also live broadcast (e.g., twitch) but i am not familiar with it.

TCP is perfectly suitable for transmitting YouTube videos. YouTube does some things to make sure that you do not really see if there was a retransmission.

Real-Time Multimedia

The Problem

Basically the problem is that we have to play around the following fact of life:

Multimedia system ideally should have a mouth-to-ear delay under 200ms for humans to be comfortable. With a mouth-to-ear delay over 400ms human brain does not consider a conversation interactive.

This applies to every system which is used for people to talk - phones, cell phones, etc.

Note that mouth-to-ear means from the time you talked till the time your words are played on the speaker on the other side. This includes not only network delay.

So, basically, one of the goals of multimedia protocols is to work around this limitation.

Why not TCP

Let's see what happens when one packet is lost, but subsequent packets can come. TCP needs to deliver packets in order. Thus TCP buffs the subsequent packets internally, while it signals the source that the packet is missing. Once the missing packet is delivered, all subsequent packets are delivered too. This is called head-of-line blocking.

The problem with real-time multimedia is actually the fact that this retransmission can cause delay >400ms. The way we deal with missing packets also requires these subsequent packets. Thus VoIP cannot use TCP.

Why UDP

Well, we are not using UDP, we are designing protocols that work on top of UDP. UDP has ports. Other than ports UDP provides the same service as IP. I think we could have a different transport protocol instead of UDP, not on top of it. I do not know any reasons why and why not, except for from the point of overhead in packet header it does not really matter.

How to deal with missing packets

Ok, retransmissions are not an option. But we have to somehow deal with missing fragments.

Basically, we have 2 options. Option 1 is to include "redundancy" into the stream (send extra packets) and option 2 is to utilize the properties of voice/video streams to approximately reconstruct the data ("interpolation") or more precisely to show human something feasible instead of the missing packet. AFAIK multimedia streams do a combination of both.

Sending redundant data

The idea is that the sender sends extra packets and, if the one packet is lost, these extra packet can be used to reconstruct the missing packet.

One example of such functionality is a class of methods called forward error correction (FEC). The very basic FEC scheme works as follows. A sender has to send p1, p2, p3, and p4. Then it constructes and send a FEC packet p5 = p1 xor p2 xor p3 xor p4. If the receiver receives any 4 of these 5 packets, it can reconstruct p1 to p4. There are more complicated options that this to handle more than one packet loss.

Note here, that FEC allows you to deal with some number of missing packets. There is no guarantee that at some point there will be more missing packets than was anticipated. Another method is still needed if this happens.

Interpolation

Voice and Video data usually has a lot of similarities between different packets. Or better say - human eye and ear are not that picky. For example, if you say hllo or hllo instead of hello, most people would understand. (I think one packet is less than one sound.) If you are watching a video with 25 fps, and one frame is missing, you probably won't notice it at all. So, there are ways that receivers can deal with missing packets, for example just replay previous packet (video), or do some interpolation between previous and next packet.

Yes, this means that the stream on the receiver may not be the same as the stream on the sender. Multimedia streams can tolerate this.

Note that even if interpolation fails, the human can still request a retransmission aka "could you repeat this", "i didn't hear what you said" and so on. Of course if this happens all the time system will not be usable, but if it happens rarely it is ok.

What is different in non real time multimedia?

Or why can YouTube use TCP?

The thing that is different is that 200ms/400ms delay requirement is not there. There is no strict delay between sending a packet, receiving this packet, and playing audio/video segment in this packet. That is why we just use TCP for convenience but change the application to play around this delay.

YouTube videos are pre-recorded. You can transmit them with the speed that the network allows you. Ideally this happens faster than playback speed.

Also, the video can be (is i think) pre-buffer-ed. You can start playing a video when you have for example 5 seconds of video in the buffer, before playing frame one. So, if there if there is a packet loss, then the player has 5 seconds of video to show, and by this time the packet will be retransmitted. If 5 seconds are too few, then they can be increased.

Note: real-time multimedia also uses similar buffers because of so called jitter (variation of packet interarrival interval), which is not covered here. But because of 200ms delay bound, this buffer can only contain a couple ms worth of data before it goes to speaker/display.

Summary

Real-time multimedia (interactive! conversations) cannot tolerate retransmissions.
Because of this we need different ways to deal with missing packets.
It happens that these ways can not tolerate head-of-line blocking, aka, if the packet is missing, the following received packets need to be delivered and processed.
Thus we need special protocols that are suited for real multimedia.

Not covered issues

Multimedia transport protocol does much more than just deal with missing packets.
Jitter (interval between arrival of different packets varies), this is an issue too.
TCP congestion control, i.e., the specific way TCP does this, is also not suitable for multimedia. But this is advance topic and require another wall of text like this.

You are missing some other options in your "how to deal with missing packets" list. One is to design the network in such a way that you never have lost or reordered packets. This doesn't work over the Internet, of course, but it works in a network that is under your control. Another possibility is redundancy, for example using SMPTE ST 2022-7. (Which is different from FEC, e.g. SMPTE ST 2022-5.) — Jörg W Mittag, Sep 10 '21 at 07:18
i will update the "fec part" thank you(!). As for the second comment, the title "transmitting multimedia **over IP**" was intended to imply that this only applies to Internet (well, everything except 200ms delay part). If it does not, how can i better reflect that? — Effie, Sep 10 '21 at 10:16

Zac67 · Accepted Answer · 2021-09-09T15:04:22.100

7

Realtime applications don't care so much about in-order reception as about in-time delivery = low latency. Roughly speaking, there is only a defined, small window in time where data is useful. Late data is simply useless.

If data is lost you might get a small glitch in video (or audio) but the clip continues to run (=the application is designed to cope with missing data and making the best of it). Retransmitting lost data isn't even desirable as you would need to pause the video/audio until the retransmitted data has arrived. When playing pre-recorded video clips this might even be an option, but it's a definite no-go for video or audio conferencing.

Using in-order - but potentially delayed - delivery with TCP has the exact same problem.

Of course, the application needs to take care of marking its datagrams, so that it knows what purpose the data serves that it has currently received. How that exactly works is off-topic here, unfortunately.

In any case, the question of connection-oriented or connection-less L4 protocol is secondary. If the transport-layer protocol you're using (often UDP) doesn't already provide a connection concept, you might need to create such a concept on the application layer (likely required for most applications) - but this is off-topic here as well.

edited Sep 09 '21 at 15:04

answered Sep 09 '21 at 11:35

Zac67

81,287
3
67
131

so, real-time video, audio uses TCP? – S. M. Sep 09 '21 at 14:56
1

@AlokMaity Not likely, TCP really isn't suitable for realtime requirements. – Zac67 Sep 09 '21 at 15:03
Difference between in order delivery and in time delivery? – S. M. Sep 09 '21 at 16:53
Protocols guaranteeing in-order or reliable delivery may cause additional delays due to buffering and waiting for missing/late data. – Zac67 Sep 09 '21 at 17:30
In time delivery out of order possible? – S. M. Sep 09 '21 at 17:38
Sure. It's the in-order that makes in-time problematic. – Zac67 Sep 09 '21 at 17:42
Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/129440/discussion-between-alok-maity-and-zac67). – S. M. Sep 09 '21 at 17:43