Simon Bisson
Contributing Writer

WebRTC without the pain

analysis
Apr 23, 20155 mins

Although still evolving, one of the hidden gems of the HTML5 specification is WebRTC. Here's the easy way to use it to stream voice, video, and data

communication 123344270
Credit: Thinkstock

One of the successes of the growing API economy is Twilio, a company that looks to the outside world as nothing more than a set of APIs that can be accessed from any app to give it telecom features.

If you are building a field service automation tool and want to give customers SMS notifications of when an engineer is due to arrive, you only need to write a few lines of code and a RESTful connection to Twilio. As with much of the cloud, Twilio’s service is pay as you go, simplifying billing relationships and accommodating everything from departmental apps to line-of-business solutions.

Recently, Twilio started focusing on a key element in the HTML5 specification, WebRTC (Web real-time communication). Designed to support two-way voice and video connections from your browser, WebRTC is a very powerful technology.

What WebRTC gives you

WebRTC is a combination of several standards maintained by two different standards bodies. Under the hood, there’s a JavaScript specification for how browsers consume WebRTC connections. There’s also an IETF specification for the underlying protocols.

Much of the excitement about WebRTC has been around using it for delivering traditional person-to-person VoIP and video call services. However, it’s better thought of as a way of adding voice and video channels to your apps.

If you’re building a custom CRM front end or a customer support service, WebRTC is a sensible way of adding new connections to your sales staff or to your support team. Quickly opening up a video connection can make it a lot easier to diagnose a problem, for example — and avoids reliance on a second application, which would drop a user out of the interaction context.

Building WebRTC apps isn’t easy. For one, the specifications aren’t finalized. Although functionality is baked into browser releases, an update to Chrome or Firefox could result in incompatibilities that would stop a call from being made. Also, much complexity lurks in setting up and maintaining the underlying channels. While the protocols are designed to support peer-to-peer connections, you’ll need some form of directory service, as well as a relay for users stuck behind firewalls. By the time you’ve finished, you might as well have built Skype from scratch.

Dialing down the complexity

How can you simplify the process? Twilio’s first voice-only WebRTC product handled all the complexities of making a connection and managing a call. This is due to the company’s experience with traditional telecom services and delivering a set of APIs that could be dropped into an app (with SDKs for mobile OSes and for common languages).

Now Twilio has announced a video WebRTC service that builds on the tooling developed for its voice WebRTC APIs. With JavaScript APIs for the Web and with iOS and Android SDK’s, it’s easy to add video calling to an app. The JavaScript APIs are designed to use WebSockets for direct connections to Twilio’s service from your code. Those APIs are also event-driven, simplifying integration between different functions (if you want access to the low-level WebRTC JavaScript, it’s there, too). There will also be C++ support for other platforms.

One service Twilio offers is support for TURN and other cross-firewall protocols. Firewall traversal has always been a problem for VoIP and video tools. (A security consultant friend of mine once described an early version of Skype as “the best hacking tool you’ve ever seen” — he wasn’t joking, going on to describe how it sniffed its way out of even the most secure networks.) Building the tools for getting through firewalls into a WebRTC service makes a lot of sense. Twilio has media relays in 28 data centers across the world, aiming to keep latency to a minimum.

You’ll be able to build four-way calling systems for more complex applications, mixing video and Twilio’s existing WebRTC voice tools. Calls are initiated from registered endpoints that can be tied to IP addresses and usernames. Use the registered username of an endpoint to begin the call, tying it to a capture device (usually a webcam or a phone’s front camera), then to a stream.

Streams are then routed peer-to-peer or through a media relay. You don’t need to do anything to handle that aspect of the connection; it’s all managed by Twilio’s APIs. That’s a big benefit — it’s here that you normally experience most of the pain. Using WebSockets further simplifies matters because you’re connecting directly to the Twilio service. The API uses JavaScript promises to handle events for you, letting you construct an event-driven work flow that runs in a single page — and because you’re using WebSockets, there’s no need to mess with complex callback patterns.

Streaming data

Along with voice and video, you’ll find support for WebRTC’s data channel. WebRTC isn’t merely a tool for one-to-one communications. It can also be used to build tools for delivering webinars or training, or for delivering content alongside voice and video.

You can push data over the data channel to all your users’ browsers, delivering presentation slides in parallel to a voiceover or uploading diagnostics from a PC to a help desk. The data channel is a surprisingly flexible tool that lets you mix traditional Web media with video.

Twilio has gone a long way toward reducing the pain of WebRTC. Moreover, with a third party like Twilio supporting an evolving standard, you can be sure you’ll be able to talk to someone at the other end of the connection. It also means you’ll be able to get started with adding video and voice to your apps before the standard is complete — whether you’re using Firefox or Chrome.

Simon Bisson

Author of InfoWorld's Enterprise Microsoft blog, Simon Bisson prefers to think of “career” as a verb rather than a noun, having worked in academic and telecoms research, as well as having been the CTO of a startup, running the technical side of UK Online (the first national ISP with content as well as connections), before moving into consultancy and technology strategy. He’s built plenty of large-scale web applications, designed architectures for multi-terabyte online image stores, implemented B2B information hubs, and come up with next generation mobile network architectures and knowledge management solutions. In between doing all that, he’s been a freelance journalist since the early days of the web and writes about everything from enterprise architecture down to gadgets. He is the author of Azure AI Services at Scale for Cloud, Mobile, and Edge: Building Intelligent Apps with Azure Cognitive Services and Machine Learning.

More from this author