In this article, we’re going to examine the details of how WebRTC architecture actually works so that a layperson can understand it.
WebRTC is an open-source project that ties together devices using peer-to-peer interactive web apps. If you’ve had an in-browser video call or played a real-time game through a web browser, WebRTC is probably what drove the back-end technology of how that web application worked.
Table of contents
At the core of every video conferencing solution sits the architecture of sending and receiving the participants’ video/audio streams. For example, if there are N participants in a video conference each of them needs to see/hear the video/audio of all other N-1 participants.
This can be implemented in different ways, but there are three main architectures which are used in practice:
A hybrid approach is also possible - to use different kinds of architecture depending on the number of participants in the conference. That is more of an optimization and will be covered at the end of the article.
Peer-to-peer (P2P) is an application architecture, which is also occasionally referred to as mesh architecture. It represents the fundamental structure of network design and is straightforward to conceptualise. In the context of a conference, each individual is a peer, broadcasting their video and audio to every other peer through the establishment of a direct peer connection.
Below is a peer-to-peer architecture diagram illustrating P2P with four participants:
In the absence of intermediate media servers, privacy, facilitated through end-to-end encryption, is inherently present. However, while this seems advantageous, there is a significant limitation with P2P: it does not utilise upload bandwidth efficiently.
For example, if there are N participants in the call, each participant needs to establish N-1 peer connections and send N-1 times their video/audio for a total amount of N*(N-1) peer connections.
Still, many homes have asymmetrical internet connections - e.g. ADSL (Asymmetric Digital Subscriber Line), where the upload speed is severely limited compared to the download speed. And even if you have a good upload speed, there will still be an issue in an office setting where many people are sharing the same internet connection.
In reality, P2P (peer-to-peer) architecture makes sense mostly for 1-1 calls where 2 people participate in the conference. In that scenario, P2P is still optimal because each of the 2 participants only sends their audio/video one time sends only one time their video/audio.
Advantages:
Disadvantages:
CPU (Central Processing Unit) usage will be significantly higher on the client side because the browser needs to encode the video N-1 times to send it to N-1 other participants. Unless you have a really powerful machine, the performance will be easily affected.
The above disadvantages make the P2P architecture reliable mostly for 1-1 calls and not scalable. In practice, you won’t often see a video conferencing provider using P2P architecture if there are more than 3 participants in the conference.
This architecture has become the preferred option in contemporary video conferencing solutions. Central SFU (Selective Forwarding Unit) media servers act as intermediaries, receiving the incoming streams and then distributing them unaltered to the other participants.
Although this approach introduces additional complexity to the server side, it is a significant enhancement over P2P architecture. It addresses the issue of limited upload bandwidth and improves scalability, which are notable challenges with P2P.
The technique of simulcast is frequently employed in SFU video conferencing. Each participant transmits multiple streams at varying qualities to the SFU unit. The SFU then selects the appropriate stream quality to forward; for instance, it may send streams of lower quality to participants with weaker internet connections. Conversely, it can route the high-definition version of a stream to those who are displaying it prominently on their local system.
That way a large amount of downlink bandwidth can be saved and many participants can be displayed in the same grid even if participants have an average internet connection.
In SFU video conferencing, as illustrated in the above diagram, each participant sends their stream to the SFU media server a single time and, in turn, receives the streams of all other participants.
Advantages:
Disadvantages:
SFU (Selective Forwarding Unit) is the most popular architecture deployed today in video conferencing.
SFU is much more efficient during upload and scalable than P2P.
Also while users still need to download and decode each of the other participant’s streams, the simulcast technique can be applied to allow a display of up to circa 50 participants in a grid on an average connection and machine.
In the MCU (Multipoint Control Unit) architecture, every participant publishes their stream only once their stream is to a central server. But unlike SFU, the MCU (Multipoint Control Unit) central server has the role of a mixer - combining all received streams into one stream.
Then all participants consume this one mixed stream instead of subscribing individually to the stream of every other participant.
Disadvantages:
The decoding/encoding and mixing are much more taxing than just routing/relaying streams like SFU. And since companies generally cannot afford to spend at least 10 times more money on the server side, SFU is the reasonable compromise which wins in most cases.
In the hybrid approach, different architectures are used depending on the number of participants in the call. Very often P2P is used for 1-1 calls and the application switches to SFU after a third person enters the call.
That way some server bandwidth/resources are spared during 1-1 calls, which could be a non-negligible saving, since 1-1 calls are pretty popular among people - according to our statistics and research around 50% of the calls are 1-1.
Of course, that percentage can vary depending on the product focus - obviously, products targeted towards large webinars won’t have as many 1-1 calls.
Advantages: Combining several architectures leads to benefiting from the advantages of all architectures depending on the situation. Using P2P for 1-1 calls is saving server resources since there are no intermediate servers.
Disadvantages: Combining several architectures into the same application increases code complexity and maintenance costs. Smoothly transitioning between P2P and other architecture (SFU/MCU) in the middle of a running call is not fully trivial.
In this article, we have explored the different architecture options that drive WebRTC technology and enable seamless video conferencing experiences. Now, let's take a closer look at how Digital Samba leverages WebRTC on the back end to provide a cutting-edge live video conferencing solution.
Digital Samba is a leading provider of GDPR-compliant video conferencing API and SDK, offering a comprehensive platform for embedding video conferencing capabilities into software products or websites. Our solution is powered by WebRTC, an open-source project that facilitates peer-to-peer interactive web applications.
By integrating Digital Samba's video conferencing API and SDK into your platform, you can unlock the power of WebRTC and provide your users with high-quality, real-time video communication. Our solution is designed to be GDPR-compliant, ensuring the privacy and security of user data. With our EU-hosted infrastructure and end-to-end encryption, you can trust that sensitive information shared during video conferences is protected.
Whether you're building a remote collaboration tool, an online tutoring platform, or a virtual classroom, Digital Samba's video conferencing solution enables seamless communication and collaboration among participants. The WebRTC architecture allows for direct peer-to-peer connections, reducing latency and ensuring a smooth video conferencing experience.
With Digital Samba, you can leverage the advantages of both P2P and SFU architectures. For 1-1 calls, our solution utilises P2P, optimizing server resources and maximising efficiency. As the number of participants increases, the architecture seamlessly transitions to SFU, leveraging the scalability and bandwidth efficiency it offers. This hybrid approach ensures optimal performance and cost-effectiveness for your video conferencing solution.
Digital Samba's WebRTC-powered live video conferencing also supports advanced features such as screen sharing, file sharing, interactive whiteboarding, and more. These features enhance collaboration and enable interactive learning experiences for virtual classrooms, remote training sessions, and online meetings.
Experience the power of Digital Samba's WebRTC-powered live video conferencing solution. Contact our sales team today to learn more and get started on enhancing your video conferencing platform.