Thursday, June 18, 2009

Incrementality in Verbal Interaction

Since I've joined a research program at Potsdam University end of last year (as a researcher and PhD student), I've decided to use this blog for some additional, more personal updates. This is the first :-).

Our research is concerned with human-machine spoken dialog systems from an incremental, i.e. real-time processing, perspective. As such, members of our team, including me, were recently invited to a workshop on "Incrementality in Verbal Interaction." The workshop brought together an interesting mix of perspectives on incrementality from Psycholinguistics as well as Theoretical and Computational Linguistics. Slides from our project presentation are available here.






Thursday, April 2, 2009

Tim O'Reilly: Google Voice Search Key Technology

ReadWriteWeb reports Tim O'Reilly addressed attendees at the San Francisco Web 2.0 Expo this week, talking about key technologies for the Web >2.0. Voice search (Google iPhone App), he claimed was a tipping point in terms "sensor based interfaces".

While not the only vendor to provide voice search (i.e. Yahoo oneSearch powered by Vlingo) Google certainly seems ahead in the game in what appears to be a gradual unfolding of a broad voice strategy, such as Voice Search and recently rebranding a feature-enhanced GrandCentral as Google Voice. Future work on the voice front we can expect includes promotion of its own speech recognition capacities through Android, Google Gears bringing speech capacities to all browers, tighter integration of Gaudi (audio indexing) with other services and perhaps one day opening up voice services over APIs.

As I've previously pointed out, to Google voice is just another form of data, but what's slowly beginning to emerge is a central role for speech and voice technologies to play in coming developments for the web and how we search and interface with it.


Wednesday, April 1, 2009

Language Technology April Fools

Just posting some gems from today concerning speech and language technology, such as natural language generation, speech recognition and natural language processing.

Have you found any others?

Thursday, February 26, 2009

Kindle Speech Synthesis

News about speech and language technology tend to be an in-industry affair, interesting largely to those who need and use it on a daily basis or those who produce (develop or market) it. Every so often however, mainstream news surface that raise issues of broad interest. Google's efforts with speech recognition are an example of this. Last month, Amazon's Kindle 2 e-book reader created a buzz with its text-to-speech "audio book" functionality.

The underlying issue is that Amazon is selling e-books, which can be listened to using speech synthesis, without owning the rights to produce audio book versions. The Authors's Guild argues that this undermines the lucrative audio book market. While it is arguable that a synthesized voice is comparable to the experience of listening to a well-produced audio book, Amazon decided not to fight this one out.

What do you think? Can synthesized audio books provide an experience comparable to real voice productions?

Monday, February 16, 2009

Microsoft Recite Preview - Note Dictation and Voice Search

Arstechnica reports today on the release of Microsoft Recite "Technology Preview" for Windows Mobile. The applications lets users record short notes as audio snippets, which can later be searched for content by speaking key words. Apparently it does not entail speech recognition rather than simpler pattern matching, meaning it cannot be searched in text form but may work more robustly, eliminating the effort of training for speaker-independency.

While not a full product yet, this sounds like a nifty little application for cognitive off-loading.

Have you tried Microsoft Recite?




Sunday, February 8, 2009

More speech on the iPhone

The iPhone has proved a game-changer in many regards and speech is no exception. Both Google and Yahoo (with vlingo) have deployed mobile speech applications for the iPhone.
Today I came across another sighting of iPhone speech recognition, Vocalia by Creaceed, employing open-source ASR engine Julius for back-end technology. There is no "push to talk" button but a "shake to retry", which may prove useful when recognition goes awry. The app supports French, English and German for now and costs €2.99. Dictation is not available at this point, though Julius is certainly capable of it from an architecture point of view.

Other speech and language related iPhone apps:,


Has anyone used these extensively? What is your experience with speech on the iPhone?

Monday, February 2, 2009

Zumba Lumba - iPhone killer or simply a hoax?

A no-frills phone with the unlikely name of Zumba Lumba has recently received some attention by the BBC. The phone is said to be top-secret, developed by a defense-aviation company. It does without frills like a camera or an applications platform, but touts some interesting security and computational features, (not only) related to speech technology:

  • Cloud computing - the phone uses no local storage for contacts, data.
  • Network speech recognition - user input is recognized over the internet. This should avoid hardware intensive local computing for voice input, but requires internet access.
  • Voice identification - enhanced security, because the phone will only respond to a single user's voice.
Some seem to think this is a potential iPhone killer at least in terms of making use of innovative input modalities (though Google already released a speech recognition app for the iPhone.) Others simply thinks it's a hoax.

Either way, the idea of joining mobile with cloud computing is interesting. Using voice identification for security has its appeal as well, even if it's unclear whether keeping data in the cloud and sending voice data over the internet is any more secure than simply keeping data on your phone, locally.