by Jon Udell

The voice of opportunity

analysis
Nov 29, 20026 mins

2003 may not yet be the Year of CTI, but it's ringing in some impressive innovations. Get ready for SIP, deeper voice and data integration, and clear-as-a-bell phones for PDAs

SPEECH IS THE defining human trait. Shortly after a speech-enabling gene called FOXP2 became fixed in the human genome 120,000 or so years ago, we became anatomically and culturally modern.

But we postmoderns are still figuring out how to collaborate effectively using e -mail, IM, and calendars. The CTI (computer/telephone integration) vision was, and still is, that the immediacy and emotional bandwidth of speech can be woven into data network applications. Clearly that hasn’t happened yet. Just consider the humble conference call, which invariably began with the caveat “in case I lose you.” Users abandoned that PBX feature and signed up for conference bridges.

Software control of basic telephony does not require that voice and data travel across the same wires. For years it’s been possible to bridge the PBX to the LAN and leverage the strengths of both. That IT managers mostly didn’t bother to do so says plainly that the benefits weren’t, by themselves, compelling. Call control and screen pops may be critical infrastructure features for the call center, but for the enterprise these nice-to-have features won’t drive PBX/LAN integration, never mind wholesale adoption of VoIP. When the 2002 InfoWorld Telephony Survey asked 400 IT leaders to rate telephony technologies, only 17 percent reported that they consider first-party call control important, and even fewer — 7 percent — mentioned third-party call control.

What will drive VoIP into the enterprise, both at the edges and in the core, is feature parity with the PSTN (public switched telephone network) plus a clear cost advantage. Neither is a slam dunk, but the rules are changing. The vaunted high quality and low latency of the PSTN, for example, does not extend to the vast number of business calls conducted on cell phones. Even as the quality of the average PSTN call heads south, VoIP calls can challenge the best the PSTN can offer. For this article, we evaluated TeleSym’s SymPhone, a software phone for wireless PDAs, against a conventional voice call. The all-IP conversation through an 802.11-equipped iPaq plugged into our left ear kept pace with its PSTN counterpart plugged into our right ear — the iPaq plus SymPhone sounded better.

For David Isenberg, an AT&T alumnus and independent telephony analyst, this result is not surprising. For years he has argued that the phone system’s architecture — a smart network with stupid devices — will inevitably yield to the Internet’s inverse model of a stupid network with smart devices. As further evidence of the power of intelligence at the edge, he points to Global IP Sound, a Stockholm, Sweden-based developer of enhanced VoIP codecs. “Their algorithm is tuned for the packet-switched environment,” Isenberg says, “and it compensates for packet loss and jitter.”

Innovation at the edge can deliver cost-reduction, too. VoIP may be a cost-effective way to bridge far-flung central offices, but that strategy doesn’t address the SOHO (small office/home office) scenarios typical of the virtual enterprise. Emerging SOHO-grade VoIP solutions can not only cut down drastically on long-distance charges but conceivably eliminate POTS (plain old telephone service) altogether.

We have been testing Vonage’s DigitalVoice on a DSL circuit in one of InfoWorld’s remote offices. This $40-per-month service uses a conventional phone handset, a Cisco ATA 186 adapter, and a Netgear router. It can make regular phone calls and, unlike Net2Phone, also receive them. Quality varies with the Internet weather.

Frankly, we wouldn’t ditch our POTS line until we see more evidence of Isenberg’s thesis, but the trend is encouraging. Almost a quarter of the survey respondents plan to use VoIP at the edge — 14 percent citing voice in favor of DSL and 10 percent citing voice in favor of cable.

We are certain that the Internet model — abundant, general-purpose bandwidth managed by intelligent endpoints — will prevail. It’s unlikely that 2003 will be the Year of CTI that some expected in 1995, but we’re seeing notable innovations. Using the Vonage product, we can visit a Web site (which should, and easily could, offer a set of SOAP-callable Web services) to forward calls to another phone, check voice mail, and review call logs. And during a PDA-to-PDA call with SymPhone’s product manager, we yanked out the wireless card, stuck it back in, and were able to continue with the call. With either gadget, your phone number and services travel with you to any Internet location. These immediate benefits only scratch the surface of what voice/data integration could mean.

Two general kinds of deeper integration are possible. In the realm of signaling and call control, SIP (session initiation protocol) can be used to weave IM-style presence into voice conversations, or conversely to inject telephone presence into data connections (see ” SIP is sneaking into the enterprise “).

Digitized voice presents a different, and potentially vast, opportunity. E-mail dominates business communication because it’s cheap and the text data can be randomly accessed, indexed, and searched. Voice data, linear and opaque, is far less useful. Speech-to-text translation systems are improving and can produce results that are searchable even when not usefully readable — but not in real time.

Moore’s Law will eventually get us there, but the brute-force approach won’t yield orders-of-magnitude improvement overnight. For that, you need a different algorithm altogether, and Fast-Talk Communications has one. Its technology for indexing and searching voice data works directly with phonemes. In a speaker-independent but language-dependent manner, Fast-Talk’s engine rips through conversations in real time, indexing not the byte offsets of words and phrases, but the time codes associated with sounds. Speech-to-text translation produces a string of phonemes; the engine finds occurrences and returns time codes. If a transcript exists, it can be fed into the engine a line at a time to enable users of conventional search engines to randomly access the voice data.

According to the Telephony Survey, the top business drivers of telephony applications are CRM, at 36 percent of respondents; and KM (knowledge management), at 33 percent of respondents. Although bottom-line savings are the obvious rationale for VoIP, IT decision makers clearly think voice/data convergence can help push top-line growth, too.