Monday, May 19, 2008

OnMobile buys Telisma

OnMobile Global Ltd today acquired France-based Telisma, a producer of speech recognition software for network/telephony environments.
The acquisition comes at a time after OnMobile recently partnered with Nuance, a Telisma competitor for speech recognition markets, to deploy voice search applications for its home market, India. India's multilingual market has made it a tough one to crack for speech technology companies, though a lucrative one as India has recently surpassed the U.S. as the second largest mobile market in the world, according to Om Malik at GigaOm.
I suspect issues specific to speech technology and India's multilingualism have something to do with this deal. As I recently pointed out, internationalization of speech and language technologies comes at a steep entry cost, due to the high demands on expertise and data required for building language-specific models. In addition, speech recognition companies like Nuance have long kept their language models under wraps. In other words, if your language isn't catered to, reaching that language's customer base becomes a very pricey affair.
While open-source aspirations to build freely availably language models for speech recognition exist, Telisma has opted on middle-ground in this matter by allowing partners/customers to build their own models, but selling the tools to do so at a price. In a market like India, the ability to cater to a multi-lingual customer base without purchase of expensive proprietary software (or paying someone else to develop proprietary software for you to purchase) may have made a big difference in this deal.

On a different note, this acquisition is the latest in a series of acquisitions consolidating the speech technology market. While five years ago telephony speech technology was a highly redundant market of small companies building similar products, today they have largely been acquired by or merged with bigger players. In the meantime, companies like Microsoft, IBM, Siemens and Google are making their own moves to enter the market.

Update:
Telismas acoustic modelling toolkit is indeed not for sale, but for free, as one reader has pointed out. Thanks!

Monday, May 5, 2008

Internationalization and Speech Technologies

The not-so-subtle truth is, of course, that we all speak English. Yet localization and internationalization are at once prerequisite and stumbling stone for many web-based endeavors.

In my own backyard, two examples illustrate the effect and need for of internationalization, respectively. German professional social network XING has internationally outperformed competitors like LinkedIn through early and aggressive internationalization. StudiVZ - the "German Facebook" has gained much of the student social network market before Facebook decided to release a German version of its web app, making this a tough-to-crack market.

Ironically, as these two examples underline, the need for localization remains in cases where the demands on usability are low (join group/contact person/send message) and the target audience can largely be expected to speak sufficient English (read this for an interesting take on the same issues and solutions in online gaming.) Moreover, localization is an effort far greater than providing an interface in the local language.

As one expects, localization and internationalization and speech technology are inextricably linked - in a sense developing speech technologies is internationalization. And using such technology in professional service projects is akin to building a internationalized web application. Here are some of the oddities I've observed while working with speech technologies in an international environment:


Translation is not enough. When you write software that speaks or wants to be spoken to, there is more at stake than providing interface text. Can you expect all your users to spell input when your system doesn't understand the raw speech input? Can you be sure that all your translated content will generate well-formed speech-synthesis output? Language and culture are sensitive issues, so a well-localized speech application must do more than provide translated user interface. Employing local staff is usually a minimum to building a speech application for a new market.


The cost shifts. Re-usability of resources from previous speech projects is usually low. So unlike localizing a web application, porting a speech application requires grunt work that you thought you had done the first time around. Moreover, speech applications in new languages almost always come with additional licensing burdens and questions about the appropriate technology partner. Expect to pay for things you didn't expect.


There is no long tail. The buy-in costs for developing a new language in almost any speech or language technology (recognition, synthesis, translation) remain constant. This makes every newly developed language a strategic decision and translates into a two-tier localization effort: one developing basic technologies, one employing such technology in professional service projects.
As an example, the world's most successful dictation software packages: Dragon Naturally Speaking ships in five flavors of English and six European languages. Philip's Speech Magic ships in 23 dialects of 11 languages. Both a far cry from world-coverage.
The enormous cost of development has a decided effect on developing speech technology for lesser-spoken languages. And it has posed a significant hurdle as well for open-source initiatives of speech technologies to provide such resources for free.