What is WebRTC Signalling? To put it plainly, signalling is how computers discover other computers to connect to using WebRTC.
Table of contents
WebRTC is an open-source project owned and maintained by Google that much of the internet’s peer-to-peer applications are built.
While WebRTC as a project is open source and can be audited by the public, this does not automatically make every application that uses WebRTC open-source; since the implementation of connection management is left up to the application developer, this makes not every WebRTC signalling server open source.
A peer-to-peer application is one where users’ computers talk directly to each other, rather than through a central server. This saves immensely on server infrastructure resources and is a very efficient way for features like real-time video and voice to make it into your favourite internet applications.
The tricky thing for a non-developer or junior developer to understand about WebRTC is that central application servers do play a role, making WebRTC not completely peer-to-peer, while the connections that WebRTC makes are peer-to-peer.
To put it plainly, signalling is how computers discover other computers to connect to using WebRTC. WebRTC relies on signalling servers in order to establish connections between peers.
WebRTC leaves it up to the developer how to determine which connections get made between peers– and so the process of who connects to who happens on the signalling server, as part of the WebRTC application.
The signalling server determines which computers (clients) get connected to each other, and the procedure by which clients get connected is standardised under WebRTC.
However, the clients themselves are completely unable to connect to each other without having a central signalling server make the connection.
Think of a WebRTC signalling server as a sort of matchmaker.
It’s important to note that WebRTC signalling servers only handle metadata or data about clients.
The signalling server itself does not handle the data streams directly as that is done between the peers. The signalling server handles information like:
These are the kinds of things that two connections need to know before they connect. Following our match-maker analogy; this is a little bit like setting up a blind date.
The process that entails signalling in WebRTC is conceptually fairly simple, but the WebRTC signalling protocol actually has quite a bit going on under the hood.
Let’s say that a client wants to use a web app that uses WebRTC; it starts out by sending an offer to that web app's WebRTC signalling server.
It doesn’t matter if the web app has a WebRTC signalling server nodejs implementation, or if it’s implemented in any other framework using other technologies like PHP; the offer is basically only information about that client and some information about how to connect to it.
This process of offering connections to the signalling server is called Session Description Protocol, or SDP.
So the signalling server now has the information of at least one client and how to connect to said client.
The signalling server now only needs to pair two or more clients together by exchanging these Session Description Protocol (SDP) offers, and, if the clients accept the offers, they will have all the information they need in order to exchange data directly peer-to-peer.
To do this, it sends the SDP offer of the first client to potential peers, and when the peer or peers get the initial SDP offer, they store that information locally, generate their own offer containing their information, and send that to the signalling server, who distributes that information back down to the initial client – or as many other clients which want to connect to one another.
Once every client has the SDP offers of all other connecting clients to one another, they can begin to connect peer-to-peer, without the use of any servers!
At this point, the clients know the list of other clients to connect to, but they need a plan for how they will connect.
Not so fast! In the real world, some computers (clients) have firewalls or NATs that prevent their IP addresses from being exposed publicly. This means that the broader internet cannot directly communicate with that computer.
This a serious problem for peer-to-peer applications! How does WebRTC handle these kinds of cases?
In a nutshell
ICE stands for Interactive Connectivity Establishment. Think of it as basically meaning “What’s the best way for these clients to connect to each other?”, and STUN and TURN servers are assisting in making these connections happen. STUN is a simple service which provides the public IP of the client as a response.
Clients need to use STUN because they are often behind NAT and they have no idea what their public address is. Mutual communication over the internet cannot be established without knowing each other’s public IP addresses.
Unfortunately in some cases even knowing each other’s public IP addresses is not enough - symmetric NAT is one such case. For these, WebRTC applications usually have TURN servers as fallbacks to hopefully route around the firewall/NAT of some clients.
When addressing specifically how clients address each other over the network (as in not including media and configuration data like codecs, access to microphone and camera, etc), there are two other servers that come into play. STUN and TURN.
STUN servers are low-latency, and handle direct network addressing information between peers. Since they only have this very light-duty task to do, they are preferred as there are very few hardware requirements to run STUN servers. Think of small pings of public-facing dynamic IP addresses between peers for the WebRTC application service to run.
In some cases, these are separate, discrete hardware servers that run in order to handle this part of WebRTC – but due to their low resource requirements, they may be hosted by the same service provider or even on the same hardware as the signalling server itself.
The problem with STUN is that it’s reliant on clear and open public-facing IP network addresses for every peer. Not every peer has that enabled as they may sit behind firewalls or use a NAT configuration that makes them unaddressable directly for WebRTC applications. In these cases, the restrictive NAT or firewall needs to be bypassed or routed around.
TURN servers are typically a fallback method if STUN servers can’t establish a direct connection between all the peers. TURN servers are typically publicly registered and whitelisted third-party servers that NATs and firewalls may accept connections from– but they do have to actually handle forwarding the media streams between the peers.
The handling of this raw data through a TURN server puts additional network bandwidth and cost requirements on the server infrastructure, and it also introduces more latency.
TURN is a fallback method that is preferred to be avoided, but it’s usually a better alternative than no connection at all. Also, it’s not a guarantee that any given firewall or NAT configuration will allow all TURN servers, so it’s up to ICE to look at all possible connection routes and choose the most efficient one.
There is a name for the list of possible connections that each peer can use, and it is called the ICE candidates.
WebRTC is complicated for developers to work with entirely because it functions essentially as a compatibility layer for web browsers to talk to each other dating back to Web 2.0.
The general trend is that WebRTC will be moved to more of a back-end technology that API services make use of in order to provide Communication Platforms as a Service (CPaaS) to developers.
Utilising these third-party APIs eliminates a huge amount of complexity, security issues, and infrastructure deployment costs for web application developers while also typically providing a much more polished suite of features to their customers that have already been debugged and supported by software engineers who specialise in this specific field of things like live video and audio, peer-to-peer real-time features such as group drawing applications, and much more.
Best of all, these APIs can extend the functionality of these polished and supported feature suites to arbitrary API endpoints, such as web pages, mobile applications, or even third-party software for feature integration purposes.
Digital Samba's suite of APIs and SDKs offers developers a powerful, streamlined way to embed video conferencing into software products or websites. As a GDPR-compliant, EU-hosted, and end-to-end encrypted service, Digital Samba provides a trustworthy and secure platform that complies with stringent data protection regulations.
Our WebRTC video chat API and video call embed SDK offer a comprehensive set of tools that are designed to meet various business and individual needs. Whether it's for integrating live video, voice, or real-time features, Digital Samba enables developers to bypass the complex and cumbersome aspects of peer-to-peer applications and focus on delivering value to their users. By leveraging our rich API ecosystem, developers can offer robust video conferencing solutions that are not just technically superior but also align with global data privacy standards. Experience the seamless combination of innovation and compliance with Digital Samba. Feel free to reach out to our team for further insights or a demonstration tailored to your specific requirements.