by Jon Udell

Multimodal user interfaces

analysis

Nov 25, 20024 mins

Visual or audio alerts can trigger different responses

I’ve just returned from a small conference devoted to the subject of social software. The invitees were an eclectic bunch: developers of community Web sites and multiplayer games, interaction and user-interface designers, analysts, and writers. It was a broadening experience. When I think of alternative ways to deliver Web services, I tend to think in terms of Flash, or another kind of rich client, or maybe WAP — in other words, a variation on the theme of the GUI. There are, of course, lots of other ways in which people can interact with information systems.

Consider sound. Several folks made the point that we process sonic information using a different part of the brain than we use to process visual information. One participant described a system that translated server logs into birdsong. When the servers were healthy, there was a pleasant ambience of happy birds — a baseline pattern that supplied information without requiring conscious attention. As the servers became stressed the birds became more agitated. Of course we’ve all experienced something like this, when the chatter of a hard disk or the blinking of LEDs on a router has alerted us that something unusual is going on.

I mentioned last week how Mindreef’s SOAPscope helps developers log and view SOAP traffic. When you already know (or suspect) there’s a problem, it’s great to have a tool that can help you diagnose it. But what does healthy traffic look like? Or, perhaps, sound like? A monitoring system might, in fact, want to use both modes. When ambient sound varies enough to trigger conscious awareness, attention might then turn to a large-screen wall-hung display that can help people visualize traffic flow. Everyone’s wired differently. Some will be better able to hear the faulty component, others to see it.

It’s helpful to offer alternate ways to gather, as well as present, information. We’re used to the idea that Web-based and IVR (interactive voice response) systems can work hand in hand. Depending on our circumstances, we’ll surf to an airline’s Web site to check on a flight or use a cell phone to retrieve the same piece of data. Microsoft’s new .Net speech SDK aims to increase the synergy between these two modes. It offers a common framework that drives both a Web GUI and an IVR interface.

The systemwide voice-command architecture that’s built into the Apple’s OS X suggests other possibilities. It does a remarkably good job of recognizing task-specific vocabularies. Consider, for example, a routine input chore like specifying a U.S. state. Classically, we ask the user to type a two-letter abbreviation or choose from a fifty-state pick-list. In the latter case, it’s really quite cumbersome to scroll down to “New Hampshire” or “Texas.” Speaking the choice (when it’s not socially awkward to do so) can be the fastest and most natural method.

Handwritten input has interesting properties too. Another conference-goer remarked that the act of scratching out a to-do item on a yellow legal pad instead of typing it into a personal information manager puts the item into a different part of the brain, where it’s more likely to be internalized and acted upon. For some people, this effect doesn’t carry over to the Tablet PC, only a real legal pad will work. For others, though, digital ink will produce the desired mental imprint.

XML Web services, and XML representations of application data, can ease some of the friction that prevents us from supporting multiple modes of interaction. So can application frameworks like those Microsoft and Apple are developing. But it’s good to be reminded why this matters. Accessibility isn’t only about including those who don’t see (or hear) well. Services that connect to people will succeed best when they can adapt to the cognitive differences that make all of us unique.

Software Development

Show me more

Topics

About

Policies

Our Network

More

Multimodal user interfaces

Visual or audio alerts can trigger different responses

Show me more

Stop using AI to submit bug reports, says Google

The ‘toggle-away’ efficiencies: Cutting AI costs inside the training loop

AI optimization: How we cut energy costs in social media recommendation systems

How to build desktop apps in Typescript with Electrobun

Write and run assembly in Python with Copapy

Run AI Models Locally on Your PC — No Cloud Required (LM Studio Guide)