Tuesday, December 4, 2007

News Redux & Building VoiceGlue

I stumbled across some "traditional" news bits this week for speech and language technologies, representing most of the major and a few interesting minor market players . Yahoo is offering some kind of NLP-driven structured search for e-commerce solutions starting next year. A new bundled automatic translation software with automatic learning capabilities was announced by across Systems GmbH and Language Weaver. Loquendo is sponsoring a speech-for-in-car-navigation industry event. Persay, maker of voice authentication software, is shipping solutions securing Planet Payment's voice-enabled payment processing. Lastly Nuance, continuing its acquisition spree, buys Viecore, a contact-center integration consulting company, indicating a clear focus on strengthening its traditional speech and telephony market position.

Recently I stumbled across and blogged about VoiceGlue, an integration of various GPL-licensed pieces of software, providing full IVR capabilities (including rudimentary speech synthesis but not recognition.) Well, last night, together with Christoph, I finally had a stab at it myself.
Our test setup involved running Fedora 9 virtualized in Mac OS X. Our Fedora installation was missing a few pieces of software beyond the indicated prerequisites, but after about an hour everything was under way.
The trickiest bit proved to be building various modules required for the XML parser (I presume needed later for VoiceGlue-customized DTMF grammar parser.) For some reason CPAN's console kept conking out on us (claiming inexplicably missing/unbuildable prereqs), so after wrestling with that for some time, we decided to manually build all the modules ourself (hoorah, makefiles).
This worked like a charm, though we hit a snag with the Module::Build perl module, which required C_Support, which in turn required another perl module (ExtUtils-CBuilders), not mentioned in any documentation (scant across the board, though that's half the fun, isn't it).
After that, the VoiceGlue installation completed swiftly and all services started running after a minimal bit of configuration.
Next week we'll be back with some test calls and our first impressions. In the meanwhile we'll keep our eyes peeled for ASR integration (LumenVox/Sphinx), which will make this a truly valuable stab at open sourcing some of the most expensive carrier-grade technology out there.

Wednesday, November 21, 2007

Assistive and Accessibility Technology

Diligent readers may have noticed that dominant news bits concerning speech and language technologies seem to focus on the cost- or time-saving aspects it. This is understandable, as the big players (Google, Microsoft, Nuance, IBM) have made it their mandate to capture lucrative markets (call center automation, directory assistance). Application of natural language technologies elsewhere, e.g. where it's fun (in games) or necessary (providing accessibility for visually impaired users), seems to lag.
Not so this week. This week seems to shine under the assistive/accessibility technology star. Note Sourceforge project "Speak as Daisy" - a Microsoft Word plugin that enables creation of XML files with markup for speech synthesis or electronic braille generation. The plugin is said to be available in 2008.
Mac users with need for improved document read back in British English will rejoice over the improved Infovox iVox voices.
Philips and Elsevier develop a speech-enabled diagnostic system for Radiologists.
Behold Nattiq's USB Hal Pen, which allows blind users to use the company's accessibility features on any computer with a USB port without installation.
Of course there's some overlap with time-, cost- and money-saving technologies as well. The FBI has announced widespread use of Nuance Dragon Naturally Speaking dictation for report and interview transcription.
Lastly, here's an a propos rant against call center automation and frustrated end-users, a target group for speech and language technologies all too often neglected. Perhaps there's a lesson to be learned about usability by the "money savers" employing speech technology, taken from those that rely on speech recognition and synthesis for their daily needs. I don't know, but F-word spotting as a means for prioritizing frustrated callers seems like an acknowledgement of defeat.

Tuesday, November 13, 2007

Back in the saddle with MSFT, GOOG and VoiceGlue

Back after an extensive break. Been working hard on some of my own multi-modal ideas. Keep your eyes peeled.
Looks like it's been a quiet fall, speech and language technology-wise. After GOOG-411, Microsoft has also added speech to their search engine endeavors (if in a different domain) by speech-enabling Live Search for mobile users. Nuance continues to consolidate the speech tech market.
Exciting news on the IVR front. Finally a serious attempt to integrate various open-source technologies to provide free carrier-grade speech/telephone services is under way. VoiceGlue has managed to combine OpenVXI (VXML browser), Flite (Speech Synthesis) on Asterisk and is planning to integrate Sphinx2 for speech recognition. All components would then be available under some form of the GPL. Could this herald a change in availability of speech telephone platforms for developers unwilling to dish out horrendous per-port costs? Something to follow, anyway.
Lastly, here's an article describing the growing role of speech in warehouse management.

Wednesday, July 25, 2007

Google on the Move, News Redux

Very quiet recently. No big acquisitions, no no speech-tech revolution.

Most interesting: Google announced Mike Cohen (of formerly Nuance) will appear as keynote speaker at SpeechTek in August to reveal Google's speech technology strategy. Google has already moved into the speech application market with GOOG411, an automatic directory assistance application leveraging business search and Google Maps.
UBC researchers announce speech learning system that doesn't use traditional data-driven model to learn the sounds of a language. Instead it is said to represent more experience driven learning, much like infants. So far, the system has acquired English and Japanese vowels.
Some product reviews/announcements: a quick history of desktop dictation, uses of TextAloud for the iPhone, and Nuance's new South African voice "Tessa".
Also on the web: NIST evaluates DARPA automatic translation software in military contexts, and What Semantic Search is Not.

I may post less frequently in coming weeks. Stay tuned.

Wednesday, July 11, 2007

This week: Bunnies, Trojans and the Jetsons

There was no shortage of novel uses for speech technology this week. Avaya and the Jersey City's Liberty Science Center announced speech-enabled exhibits, allowing customers to access information and services in the museum using their voice (and, of course, mobile devices).
Gizmo freaks should love (and everyone else should hate) this bunny, displaying speech recognition and synthesis, while also providing some unified communication capacities.
Also novel, though on a sadder note: speech is finally on the malware radar for good, as TTS trojans popped up using Microsoft's builtin text-to-speech engine to annoy users by commenting their own malicious behavior. Call it the salt-in-wound virus. This news comes after about half a year after a MS Vista speech recognition security flaw was revealed, whereby the recognizer enables remote execution of content on a computer running speech recognition.

Traditional speech applications made some headlines this week as well: Nuance signs deal with Damovo to roll out speech apps in Ireland, forecasting €1.5m in profits over the next year. TuVox annouces hosted on-demand speech apps for VOIP access.

Lastly, here is an interesting article about the Jetsons and why speech technology hasn't caught on as much as we have all hoped.

Tuesday, July 3, 2007

Slow week in terms of language technology news.
On the gaming front: Nintendo announced they were playing the middleware game for Wii development by opening up the platform to 3rd party technologies. Among the first to sign on was Fonix, allowing game developers to integrate VoiceIn Game edition, their video game console speech recognition and "karaoke" SDK. The karaoke feature seems rather gimmicky, geared only at the karaoke gaming genre, which seems rather niche. Fonix has displayed strong focus on gaming in the past, integrating as Sony PS3 middleware.
Unfortunately, speech in games has never made a big splash, but it represents a refreshing move away from customer service applications. Perhaps the middleware approach of many platform vendors will change things.
Talking about the customer service front: Genesys and Merced Systems team to develop improved reporting tools. Measuring and reporting customer service interaction has made headway recently. Focus on interaction effectiveness of natural language/speech applications intends to help correct some of the poor image that self-service applications live with. Relatedly, this article describes the shortcomings of such applications in the past and proposes a less-is-more, faster interaction paradigm for interactive voice response applications. While not all problems with IVR applications boil down to complicated menu structures and long response times, this is certainly a pointer in the right direction, placing emphasis on dialogue design rather than engineering.
Lastly, showing that not all speech communications is simply about customer service, Voxeo snags Gartners "Cool Vendors in Enterprise Communications, 2007" title, awarded to companies for being among the "interesting, new and innovative".

Tuesday, June 26, 2007

Nuance, Tegic and the woes and comeback of mobile speech

So the big news this week is Nuance's acquisition of the month: Tegic. Tegic supplies T9 predictive text input to several mobile phone manufacturers. The acquisition represents Nuance's recent focus on acquiring mobile technology market companies. It serves Nuance with a strategic customer base, including obvious candidates for Nuance's speech technologies. Aside from the strategic benefits, the technical result of mixing predictive text input with speech is interesting and something to be followed.
Coincidentally, the woes and comeback of using speech for I/O on mobile devices are described in these articles this week.
Lastly here is an interesting interview with Lin Chase, director of Accenture R&D in Bangalore, India, who held several prominent positions in the speech tech industry in the past. Topics include speech, women in the industry and why Americans should travel.

Wednesday, June 20, 2007

Healthcare, Security and the Army...

...these are the three overarching themes of the speech technology news that I came across this week. There are some obvious and less obvious points of contact here:

Tuesday, June 12, 2007

Speech Meets Sales, Video Gaming and the Economist reports...

Many of those working in speech recognition, especially deploying customer-service telephone application, have grown tired the limited scope that most projects entail. I recently wrote about speech enabled knowledge bases as a novel type of speech app. In what may be another - at least I haven't heard this before - MTI and FasTrak Retail combine efforts to launch a 'virtual sales associates' platform. And of course there are the recurring dreams of voice enabled video gaming.

Speech synthesis is naturally more diverse than its recognition sibling (perhaps not everything 'I' in I/O can be channelled through voice, but pretty much everything 'O' can be synthesized.) In todays news, TTS is employed in emergency response systems to broadcast text messages as audio.

Lastly, speech got some rep in the Economist June 7th issue.

Friday, June 8, 2007

Germany-based and search-engines-driven language technology

There has been lot's of German-based language technology news over the past couple of weeks:

Also some attention on language-technology-related search engine news:

Tuesday, May 22, 2007

Weekly New Redux...

Today, I came across some novel(ish) uses for text-to-speech:

On the mainstream speech recognition front:
And some Web3.0 language tech news:

Sunday, May 20, 2007

News are back...

Ok, I'm back from vacation and finally sorted through some of the recent developments in the speech world. Going forward I will probably post longer but less frequent tidbits here.

Biggest recent speech news is the acquisition of VoiceSignals, broadening their mobile end user market as well as adding some nifty voice features in short messaging and mobile phone usability.
On related news, here is a short article describing the role of speech in unified messaging.
Lastly, here is a description of progress on open-source telephony and speech recognition.

Tuesday, April 24, 2007

Speech Enabled Knowledge Bases

Two articles and a product showcase recently demonstrated speech-enabled knowledge base solutions. In essence products/solutions such as this are expert systems with various degrees of complexity, ranging from speaking manuals to complex diagnosis systems. Users can describe a problem and ultimately receive an answer, whether through complex one-shot natural language processing/understanding or a plain-old, multi-step directed dialogue.
Alongside traditional call-center automation applications - e.g. customer service, process automation, pre-qualification, directory assistance - these systems represent a minor market segment. However they are relatively novel, so much can still happen. Especially in medical/health care domains, the market appears untapped and the list of potential applications broad.

Friday, April 20, 2007

Daily News Redux...

Today on the WWW:

Thursday, April 19, 2007

Daily News Redux...

On the WWW today:

Wednesday, April 18, 2007

Daily News Redux...

On the WWW today:

Tuesday, April 17, 2007

Daily News Redux...

On the WWW today:

Monday, April 16, 2007

Daily News Redux...

Today on the WWW:

  • Software Ali Baba parses medical abstracts, generates visual network or terminology using natural language processing.
  • A redux of latent semantic indexing (LSI) for use in search engines.

Wednesday, April 11, 2007

Daily News Redux...

Today on the WWW:

  • Nuance announces voice search framework, based on directory assistance solutions portfolio.
  • Epson releases speech synthesis chip, powered by Fonix engine, allows mixed output of synthesis and pre-recorded speech.
  • Loquendo text-to-speech gives speech to Activa Multimedia iVAC avatars.

Tuesday, April 10, 2007

Daily News Redux...

Today on the WWW:

Monday, April 9, 2007

Web 3.0 and Natural Language Processing

Web 3.0 is getting some buzz in the blogosphere. Like Web 2.0, it begs the question that PCMag.com recently ran by its readers: what is it? However this time around things seems a bit easier.

Web 2.0 seems to be happy with being vaguely defined (delimited may be a better term) and equally a social and a technological movement. Web 3.0 clearly hovers over the idea of the "Semantic Web", a term coined by Tim Berners-Lee, in which richly mark-upped hypertext and data allow for novel more meaningful human-machine and machine-machine communication. Radar Networks (currently in stealth mode) claim to be driving some interesting developments in this direction and are followed closely by those interested.

This has already raised some questions: will content be expensive hand labor or machine boot-strappable, what new privacy policies do we have to live with, how does one separate style and content, what are alternatives to RDF.

Sadly, there's very little inspiring out there about potential applications.

My question (though not uniquely mine) to add to this: What role will natural language processing play in this (i.e. how "semantic" is this talk of Semantics)? Semantic content in RDF appears to be little more than a means for one machine to tell another who authored a particular book or what are the postal codes in the greater Boston area. Semantics to me is as much about intentions ("Why is web-service A dispensing such information?") and interpreting such information for the purposes of action ("What can web-service B - or my browser or I - do with it?").

Perhaps this misses the mark and semantic really isn't about natural language. But there is a weaker, more real form of this "language and technology" concern: Insofar as semantics is just information, can it be bootstrapped by a machine (perhaps even linguistically informed rather than statistically)?


Thursday, April 5, 2007

Daily News Redux...

On the WWW today:

Tuesday, April 3, 2007

Daily News Redux...

Daily News Redux:

Questions of the day:
  • Web X.0 IEEE workshop. What role will NLP play?
  • Are GPS navigation systems driving the TTS market (links randomly chosen from recent navigation system releases)?

Daily News Redux...

On the WWW today:

  • CallMiner announces Eureka product for call center speech analytics and QA.
  • Envox CT Connect 7 VXML/CTI plattform now Avaya telephony compliant
  • Some blogging about the role of symbolic vs brute-force statistics in articificial intelligence, NLP, Google's machine translation vision.

Monday, April 2, 2007

Daily News Redux...

On the WWW today:

Sunday, April 1, 2007

Daily News Redux...

On the WWW today:

Saturday, March 31, 2007

Daily News Redux...

On the WWW today:

Have a good weekend!

Friday, March 30, 2007

Daily News Redux...

On the WWW today:

Thursday, March 29, 2007

Daily News Redux...

On the WWW today:

  • Article about Google statistical machine translation algorithms, mentions success in Arabic (cf. NIST benchmarks finding Google's Arabic/Chinese->English translation most accuracte.)
  • Teragram MyGAD.com search engine launch, employing NLP for improved information retrieval. In related news, a list of top-100 search engines, including more NLP and some audio searches.
  • Article about predicive software application for the tourism industry, calls for NLP and other AI techniques such as neural networks.
  • Nuance unveils voice music search application for mobile ASR applications. In related news, Nuance ships improved mobile TTS.

Wednesday, March 28, 2007

Three Observations about Recent Language Technology News

To start us off, recent experience has shown three things:

  1. Speech (i.e. voice) related news is TTS-dominated, less so by ASR.
  2. The company featured most frequently in the news is Nuance.
  3. The talk of semantic search engines seems to dominate the NLP news.
The success of TTS is largely due to requirements set by mobile and in-car technologies, especially GPS and communications. The future of ASR in the other hand seems to depend on the dictation market (especially in the healthcare sector) and a growing relevance of network ASR (driven by advancing VoIP, impact of multi-modal applications).

Nuance's continued position will depend on the role of "super players" IBM and Microsoft and to a lesser degree the role of open-source initiatives, especially on the network/telephony side.

Semantic search engines recently got some media hype with "Google-Killer" Powerset, a PARC offspring. While in its infancy, some believe this development towards semantic web will usher in a Web3.0 revolution. Of course, soem others believe this has already begun, while yet more just wanna see what happens with all this.

Let's see how these trends develop. Especially multi-modality and semantic searches will be issues to follow closely.


Welcome. Here I will follow what the news and other blogs have to say about what may broadly be called human language technology. These include, but aren't limited to, automatic speech recognition (ASR), text-to-speech (TTS), speaker recognition/verification (SV), machine translation (MT) and natural language processing (NLP).

Oh and of course: this blog is intended to be informative and, unless otherwise specified, makes no claim about the truthfulness of any referenced material. I will do my best to ensure that any of my own opinions can easily be discerned as such. Comments and debate are always welcome.