Sunday, January 27, 2008

The Times Reports & Is SciFi Really Wrong?

The New York Times today published an interesting, if brief, article about speech recognition in the mobile/telco space - cited as a "$1.6 billion market in 2007". The article provides a brief overview of a range of applications and mashups, such as vlingo.com and SimulScribe as well as some directory assistance services (but omitting some others such as SpinVox, GOOG411), that use voice.
The article opens:

"Innovation usually needs time to steep. Time to turn the idea into something tangible, time to get it to market, time for people to decide they accept it. Speech recognition technology has steeped for a long time"
And concludes:
"Even a digital expert [...] cautions that some people may never be satisfied with the quality of speech recognition technology — thanks to a steady diet of fictional books, movies and television shows featuring machines that understand everything a person says, no matter how sharp the diction or how loud the ambient noise."

But isn't this a bit hackneyed? Perhaps by today's standards a twenty-year steeping period seems long, but this is hardly the case anywhere else in history. And after re-watching 1982's Blade Runner recently, I actually felt rather optimistic that we are today close to what the movie's expectations for speech recognition and speaker verification were for 2019. Elsewhere , a similar picture emerges.
The Star Trek ship computer's speech recognition engine (the year is 2151), while accurate, stills require the push of a button to kick in, rather than listening for the hot word "computer", a capacity available , if not quite ripe for deployment, today.
Of course, there are the HALs (2001), Marvins (no date), C3P0s (Long long time ago...), whose capacities far exceed that, which we dare dream our mobile phones can one day understand. But here it seems the problem is less about the quality of speech technology - the quality of HAL's speech synthesis is available today, and Marvin's characteristic monotone baritone should be easy to do - rather than about the old hard-soft divide in Artificial Intelligence. As long as we use a hard-AI problem, which speech arguably is, to solve soft-AI problems ("find closest pizza service") we cannot fail to be disappointed.

1 comment:

Jonas said...

Good SciFi speech recog/synth roundup :)

I think it's time for a SciFi movie night! I can bring Blade Runner Directors Cut on DVD ...