NTT DoCoMo develops speech recognition without speech

news

Mar 3, 20044 mins

System can be used in noisy environments where other systems have problems

YOKOSUKA, JAPAN — NTT DoCoMo Inc. lifted the lid Tuesday on its five-year-old research and development (R&D) center in Japan and demonstrated a couple of the technologies the operator is working on, including a speech recognition system that doesn’t require speech.

The company, Japan’s largest cellular operator, last year spent around ¥150 billion ($1.4 billion) on R&D. It employs more than 1,100 people to look into new technologies and communications methods. Many of those people are based here in Yokosuka, west of Tokyo. The carrier, which also has laboratories in the U.S. and Germany, is in the process of establishing a center in China.

Research covers areas of importance to the company’s business. For example, HSDPA (high-speed downlink packet access) data communications technology, which allows downlink speeds of up to 14Mbps (bits per second), is being developed in Yokosuka, as well as multimedia and human-machine interaction technologies, such as the speech recognition system.

The system, which is still a prototype, works by measuring the electrical activity in muscles that are used when a person speaks using a system called electromyography (EMG). This means the user still has to mouth the words as if they are being spoken but audible speech itself isn’t necessary.

Three electrodes need to be touching various areas of the face to measure the electrical activity. In the demonstration, a user had sensors mounted on his thumb and first two fingers. The thumb was placed under his chin, the forefinger was held vertically touching his cheek bone and the second finger held just above his top lip. While the actual positioning may look strange, it is not particularly difficult and doesn’t impede speech or the mouthing of words a great deal.

At present the system, which is the result of three years of work by a small number of researchers, has been programmed to recognize the five vowels of Japanese.

In the demonstration, the system worked well, accurately recognizing the vowels mouthed by a researcher.

The next stage in development is recognition of consonants, and developers are also working on other languages, said Tomoyuki Ohya, director of NTT DoCoMo’s Multimedia Signal Processing Laboratory. Because EMG doesn’t require sound to recognize what is being said, the system can be used in noisy environments where existing speech recognition systems have problems, according to NTT DoCoMo.

The company also demonstrated a three-dimensional audio communication system for cell phones. Similar to surround-sound, the system adds depth to the basic left and right distinction offered by stereo and then adapts the sound being heard depending on a listerner’s position in relation to that of the person speaking.

As an example, NTT DoCoMo offered the example of two people trying to meet each other in a crowded place. Usually, a cellular telephone conversation between the two would include relating details of close-by objects — for example: “I’m next to the coffee shop near the clock.” But the new technology would make the other party’s voice appear to be coming from that person’s actual location. So, in this case, if the coffee shop were 100 meters behind and to the right of the listener, the voice would appear to come from that direction. As the listener turns around, the voice would continue to appear from the same physical location, which after turning would be ahead and left.

In its current configuration, the system requires a pair of headphones because it needs to deliver sound to both ears to give the illusion of depth. It also requires some type of positioning technology that is able to accurately measure the location of users and also the direction they are facing.

Research on EMG technology is ongoing. Nobuhiko Naka, a research engineer at the Multimedia Signal Processing Laboratory, wasn’t willing, however, to estimate when it could be complete.

The two prototypes aren’t the first shown by NTT DoCoMo to challenge current cellular telephone technology. In 2000, the operator demonstrated at a Japanese trade show a telephone that sends sound to the ear by creating vibrations that travel along bones.

Although the company hasn’t yet sold a product based on the technology, rival carrier Tu-Ka Cellular Tokyo Inc. earlier this year launched a handset produced by Sanyo Electric Co. Ltd. that uses a similar system. The Sanyo TS-41 handset looks and functions as a conventional clamshell handset but can be pressed against the user’s face and the vibration function turned on when in a noisy environment.

Software Development

by Martyn Williams

Senior Correspondent

Martyn Williams produces technology news and product reviews in text and video for PC World, Macworld, and TechHive from his home outside Washington D.C.. He previously worked for IDG News Service as a correspondent in San Francisco and Tokyo and has reported on technology news from across Asia and Europe.

Show me more

Topics

About

Policies

Our Network

More

NTT DoCoMo develops speech recognition without speech

System can be used in noisy environments where other systems have problems

More from this author

BlackBerry is really back this time, thanks to the KeyOne

BlackBerry KeyOne to launch in US and Canada in late May

Google becomes first foreign internet company to launch service in Cuba

Trump’s cybersecurity mystery: 90 days in, where’s the plan?

Samsung taps DOD tech veteran to head enterprise push

Twitter sues the US government for demanding it unmask an ‘alt’ account

Trump extends Obama executive order on cyberattacks

This tiny chip could revolutionize smartphone and IoT security

Show me more

Oracle adds pre-built agents to Private Agent Factory in AI Database 26ai

TypeScript 6.0 arrives

JetBrains launches AI coding agent management platform

How to build desktop apps in Typescript with Electrobun

Write and run assembly in Python with Copapy

Run AI Models Locally on Your PC — No Cloud Required (LM Studio Guide)