Tuesday, December 29, 2009

Bye Bye 2009, new URL

Good-bye 2009. With you I will close the doors on this blog. At least at its current URL.

After preciously few posts in the past months, in part due to parenthood, in part due to facing new new challenges in academia (both are challenging in their own ways, of course), I have decided to put everything in a new location. I will continue to talk about "news" in speech and language tech, however since these are hard to come by (at least the really good stuff) I want to add a bit of a personal note. More howtos, gotchas, what's-Okko-up-to. Less, whooo-look-who-is-talking-about-speech. I'm sure you know what I mean.

With that, have a happy start to 2010 and please continue to follow at www.okkoblog.com.


Saturday, July 11, 2009

Speech and Dialog Conferences / Speech for iPhone and Android

Conference time: I will be spending a couple of days in London and Brighton from September 5th attending Interspeech, SIGDIAL as well as a researcher round-table. Anyone interested in meeting up, feel free to get in touch.

Also, here are some more or less recent, interesting news for Android (at about 6:20, thanks Schamai) and iPhone speech developers.

Thursday, June 18, 2009

Incrementality in Verbal Interaction

Since I've joined a research program at Potsdam University end of last year (as a researcher and PhD student), I've decided to use this blog for some additional, more personal updates. This is the first :-).

Our research is concerned with human-machine spoken dialog systems from an incremental, i.e. real-time processing, perspective. As such, members of our team, including me, were recently invited to a workshop on "Incrementality in Verbal Interaction." The workshop brought together an interesting mix of perspectives on incrementality from Psycholinguistics as well as Theoretical and Computational Linguistics. Slides from our project presentation are available here.

Thursday, April 2, 2009

Tim O'Reilly: Google Voice Search Key Technology

ReadWriteWeb reports Tim O'Reilly addressed attendees at the San Francisco Web 2.0 Expo this week, talking about key technologies for the Web >2.0. Voice search (Google iPhone App), he claimed was a tipping point in terms "sensor based interfaces".

While not the only vendor to provide voice search (i.e. Yahoo oneSearch powered by Vlingo) Google certainly seems ahead in the game in what appears to be a gradual unfolding of a broad voice strategy, such as Voice Search and recently rebranding a feature-enhanced GrandCentral as Google Voice. Future work on the voice front we can expect includes promotion of its own speech recognition capacities through Android, Google Gears bringing speech capacities to all browers, tighter integration of Gaudi (audio indexing) with other services and perhaps one day opening up voice services over APIs.

As I've previously pointed out, to Google voice is just another form of data, but what's slowly beginning to emerge is a central role for speech and voice technologies to play in coming developments for the web and how we search and interface with it.

Wednesday, April 1, 2009

Language Technology April Fools

Just posting some gems from today concerning speech and language technology, such as natural language generation, speech recognition and natural language processing.

Have you found any others?

Thursday, February 26, 2009

Kindle Speech Synthesis

News about speech and language technology tend to be an in-industry affair, interesting largely to those who need and use it on a daily basis or those who produce (develop or market) it. Every so often however, mainstream news surface that raise issues of broad interest. Google's efforts with speech recognition are an example of this. Last month, Amazon's Kindle 2 e-book reader created a buzz with its text-to-speech "audio book" functionality.

The underlying issue is that Amazon is selling e-books, which can be listened to using speech synthesis, without owning the rights to produce audio book versions. The Authors's Guild argues that this undermines the lucrative audio book market. While it is arguable that a synthesized voice is comparable to the experience of listening to a well-produced audio book, Amazon decided not to fight this one out.

What do you think? Can synthesized audio books provide an experience comparable to real voice productions?

Monday, February 16, 2009

Microsoft Recite Preview - Note Dictation and Voice Search

Arstechnica reports today on the release of Microsoft Recite "Technology Preview" for Windows Mobile. The applications lets users record short notes as audio snippets, which can later be searched for content by speaking key words. Apparently it does not entail speech recognition rather than simpler pattern matching, meaning it cannot be searched in text form but may work more robustly, eliminating the effort of training for speaker-independency.

While not a full product yet, this sounds like a nifty little application for cognitive off-loading.

Have you tried Microsoft Recite?

Sunday, February 8, 2009

More speech on the iPhone

The iPhone has proved a game-changer in many regards and speech is no exception. Both Google and Yahoo (with vlingo) have deployed mobile speech applications for the iPhone.
Today I came across another sighting of iPhone speech recognition, Vocalia by Creaceed, employing open-source ASR engine Julius for back-end technology. There is no "push to talk" button but a "shake to retry", which may prove useful when recognition goes awry. The app supports French, English and German for now and costs €2.99. Dictation is not available at this point, though Julius is certainly capable of it from an architecture point of view.

Other speech and language related iPhone apps:,

Has anyone used these extensively? What is your experience with speech on the iPhone?

Monday, February 2, 2009

Zumba Lumba - iPhone killer or simply a hoax?

A no-frills phone with the unlikely name of Zumba Lumba has recently received some attention by the BBC. The phone is said to be top-secret, developed by a defense-aviation company. It does without frills like a camera or an applications platform, but touts some interesting security and computational features, (not only) related to speech technology:

  • Cloud computing - the phone uses no local storage for contacts, data.
  • Network speech recognition - user input is recognized over the internet. This should avoid hardware intensive local computing for voice input, but requires internet access.
  • Voice identification - enhanced security, because the phone will only respond to a single user's voice.
Some seem to think this is a potential iPhone killer at least in terms of making use of innovative input modalities (though Google already released a speech recognition app for the iPhone.) Others simply thinks it's a hoax.

Either way, the idea of joining mobile with cloud computing is interesting. Using voice identification for security has its appeal as well, even if it's unclear whether keeping data in the cloud and sending voice data over the internet is any more secure than simply keeping data on your phone, locally.

Monday, January 26, 2009

SVOX purchases Siemens AG speech-related IP

Following Nuance's acquisition of IBM speech technology intellectual property two weeks ago, Zurich-based SVOX today announced the purchase of the Siemens AG speech recognition technology group. The deal gears at creating "obvious synergies of developing TTS, ASR and speech dialog solutions" and enhances SVOX's portfolio of technologies, which to date included only highly specialized speech synthesis solutions, to now entail speech recognition.
Like the Nuance-IBM deal (and unlike the Microsoft acquisition of TellMe), this merger breaks with the obvious big-fish small-fish paradigm. Here, a larger company's (IBM, Siemens) R&D division was sold to a smaller, more specialized company (SVOX, Nuance).
Both transactions come with an intend to pursue development of novel interactive voice applications. However while Nuance announced the potential development of applications across platforms and environment with IBM expertise and IP, SVOX appears to stay on course with its successful line of automotive solutions to build
"a commanding market share in speech solutions for premium cars".

This deal adds SVOX to a list of companies offering network and embedded speech recognition technologies, also including Nuance, Telisma, Loquendo and Microsoft. Financial terms of the deal were not announced.

Friday, January 16, 2009

Nuance acquires IBM speech patents

Nuance yesterday announced the acquisition of speech-related patents from IBM. The deal encompasses a "licensing and technical services agreement", with IBM continuing to support existing customers. Integrated solutions of the two companies' technologies are expected in two years time, according to the press release.

This deal represents a further step in market consolidation, which Nuance has pursued via a number of mergers and acquisitions over the past years. Friends in the industry tell me IBM has been trying to market their suite of IVR voice application server software more aggressively, however speech research activity, once part of the company's "pervasive computing" vision, has declined lately.

Perhaps the IBM vision will bear fruit at Nuance, as the announcement comes with a commitment " to proliferate advanced speech capabilities across a broad range of devices and environments". One thing is sure: much like Nuance's recent acquisition of Philips voice products, years after taking over Philips IVR products and solutions, this deal represents another closure, as Nuance has been marketing and supporting IBM's ViaVoice product line for years. The de facto number of competitors on the speech and voice technology market is shrinking, as applications become more mainstream.