The emergence of the smartphone has delivered a whole new world of convenience and context into our daily lives. From deciding where to grab lunch to obtaining directions to a nearby hotel or gas station, mobile search is inherently local. And as smartphones and tablets are integrated into people's lives, local doesn't just mean their current location. It could include where they'll be tonight, this weekend or even next week. AT&T Labs has developed a way to let your navigational searches fit where you are and where you'll be.
How did the Idea Hatch?
Multimodal capabilities were the driving force behind the idea for this technology. AT&T Labs researchers pioneered work on multimodal interfaces with the MATCH System (Multimodal Access To City Help) over a decade ago. The initial system for this technology was built as a test bed to explore multimodal interaction. The recent proliferation of advanced mobile devices, high speed mobile networks, and cloud-based services has made the deployment of true multimodal interfaces with customers a reality.
About the Project
Speak4it puts the world at the tip of your tongue. It is the first truly multimodal voice-driven application capable of understanding both spoken word and physical gesture. Speak4it helps consumers browse restaurants within a specific area, obtain directions to the nearest gas station, call their local pharmacy, and access information to a variety of other local businesses — all through the power of speech and gesture. It also lets the users encircle any region on the map and ask what is there. Once users find the places they were searching for, they can call the business, get more information, or ask for directions.
The fundamental technologies supporting the Speak4it application may someday be used to improve functionality of other mobile applications and services, such as in-car applications and apps for planning travel. Other domains that could benefit from the technology behind Speak4it include access and annotating electronic medical records and enterprise applications for interacting with complex visualizations such as network or circuit diagrams.
About the Researchers
Mazin E. Gilbert, Ph.D., MBA is an Assistant Vice President of Technical Research at AT&T Labs. He is a leading expert in the area of spoken language technology. He has a Ph.D. in Electrical and Electronic Engineering and an MBA for Executive degree from the Wharton Business School. Dr. Gilbert has 25 years of experience in speech and language technologies and applications. His responsibilities include managing research and development in the areas of automatic speech recognition, natural language processing, web and speech mining, and multimodal voice search. His business areas of focus include product strategy and development, entrepreneurship, and corporate finance. He is the recipient of the AT&T Science and Technology Medal Award (2006).
Michael Johnston, Ph.D., is a Principal member of technical staff at AT&T Labs. He has over 21 years of experience in speech and language technology and has worked at the forefront of multimodal interface research for 15 years. He is currently responsible for AT&T's research program in advanced multimodal interfaces, holds 14 U.S. patents, has published over 50 technical papers, and currently serves as editor and chair of the W3C EMMA Multimodal standard.