A mother of two children who speaks five languages, AT&T Labs researcher Taniya Mishra is the ultimate example of what happens when your personal life harmonizes with your work. Professionally, Taniya's work revolves around speech technology and building synthetic voices that can embody different characters and emotions. Personally, Taniya brings a love of languages that she discovered at a young age.
Taniya explains, "I grew up in India and even before I knew what it was to learn languages, I was speaking five languages fluently." Those languages included the national language of Hindi, the local language in her city, English while at home and French at school. Taniya expanded upon this talent and passion in college, when she worked with a professor to develop applications for the visually impaired who needed to utilize text-to-speech technology.
Today, Taniya has found a way to merge her experience with her own small children and her knowledge of languages. "Something that I am working on right now is a storybook application and it's geared towards kids and as we know, kids are pretty tough customers." We recently sat down with Taniya to learn more about this project and the general scope of her work at AT&T Labs.
Tell us about how you started here at AT&T.
I work primarily on text-to-speech synthesis, which is the technology of taking text and converting it into speech. I have been at AT&T for a little over three years, and the reason I found AT&T to be the right place for me is because it has this long history of innovation and cutting edge technology, in almost all fields, but especially in the area of speech and language.
What does innovation mean to you?
To me, innovation is a really active word. When I think of innovation the phrase that comes to mind is "out-of-the-box" thinking. I also believe innovation is, not innovating in a vacuum, but anticipating what our customers need in five or 10 years, or looking at the problems they are facing right now.
Can you tell us about some of the projects you're working on?
There are two projects that I am working on right now that I find personally very interesting and have a lot of applications. One is expressive speech synthesis — or text-to-speech technology — and it is the process of taking text and converting it into speech.
I am sure that many people have used this technology to some extent, and one of the top complaints when people use this technology is that it sounds robotic, it sounds unnatural, it sounds boring. My area of research is expressive speech synthesis, so looking into how realistically can we mimic a human's expressions and naturalness and be able to get the computer to do the same.
One of the applications that I am working on that uses expressive speech synthesis is called StorEbook — a storyteller application geared toward kids. What it's meant to do is take children's stories that are in text form and convert them into speech. As we know, children are very hard customers. You can lose them very easily if you are boring or unnatural, so this kind of sets the bar pretty high for me.
The prototype that I have for this application, reads children stories with many different characters, who have their own voices and distinct way of expressing things. This helps get to the meaning of the sentence rather than just using exactly same intonation, irrespective of what the character is saying.
I think it's a really fun project. I have two kids, one and three, and my three-year-old loves to play it and calls it the "story game." I think I have a built-in user base.
What sparked your interest in developing this particular application?
Even though the app on the surface looks really cute, the underlying technology is extremely sophisticated and the techniques translate really well to general purpose text-to-speech technology. Any advances that we make within the StorEbook app can easily be applied to reading newspapers to adults or reading any kind of information that somebody who is visually impaired needs to get off the Internet.
What inspired you to pursue a career in speech technology?
In college, my double-major was in mathematics and computer science. So, as a senior year project, I ended up working with a professor who was building applications for people who were visually impaired, and as a result needed to use text-to-speech technology. And later in the year, as I was thinking about going to grad school, I started looking at schools that focused on text-to-speech technology, and I ended up going to Oregon Health and Science University.
Why is your work important to consumers?
All of us end up using text-to-speech in different contexts — whether it is in an eyes-free environment and you want to get some information or whether you are visually impaired in some capacity. This is really a need and not just a want.