Since I've joined a research program at Potsdam University end of last year (as a researcher and PhD student), I've decided to use this blog for some additional, more personal updates. This is the first :-).
Our research is concerned with human-machine spoken dialog systems from an incremental, i.e. real-time processing, perspective. As such, members of our team, including me, were recently invited to a workshop on "Incrementality in Verbal Interaction." The workshop brought together an interesting mix of perspectives on incrementality from Psycholinguistics as well as Theoretical and Computational Linguistics. Slides from our project presentation are available here.
Thursday, June 18, 2009
Incrementality in Verbal Interaction
Thursday, April 2, 2009
Tim O'Reilly: Google Voice Search Key Technology
ReadWriteWeb reports Tim O'Reilly addressed attendees at the San Francisco Web 2.0 Expo this week, talking about key technologies for the Web >2.0. Voice search (Google iPhone App), he claimed was a tipping point in terms "sensor based interfaces".
While not the only vendor to provide voice search (i.e. Yahoo oneSearch powered by Vlingo) Google certainly seems ahead in the game in what appears to be a gradual unfolding of a broad voice strategy, such as Voice Search and recently rebranding a feature-enhanced GrandCentral as Google Voice. Future work on the voice front we can expect includes promotion of its own speech recognition capacities through Android, Google Gears bringing speech capacities to all browers, tighter integration of Gaudi (audio indexing) with other services and perhaps one day opening up voice services over APIs.
As I've previously pointed out, to Google voice is just another form of data, but what's slowly beginning to emerge is a central role for speech and voice technologies to play in coming developments for the web and how we search and interface with it.
Wednesday, April 1, 2009
Language Technology April Fools
Just posting some gems from today concerning speech and language technology, such as natural language generation, speech recognition and natural language processing.
Have you found any others?
Thursday, February 26, 2009
Kindle Speech Synthesis
News about speech and language technology tend to be an in-industry affair, interesting largely to those who need and use it on a daily basis or those who produce (develop or market) it. Every so often however, mainstream news surface that raise issues of broad interest. Google's efforts with speech recognition are an example of this. Last month, Amazon's Kindle 2 e-book reader created a buzz with its text-to-speech "audio book" functionality.
The underlying issue is that Amazon is selling e-books, which can be listened to using speech synthesis, without owning the rights to produce audio book versions. The Authors's Guild argues that this undermines the lucrative audio book market. While it is arguable that a synthesized voice is comparable to the experience of listening to a well-produced audio book, Amazon decided not to fight this one out.
What do you think? Can synthesized audio books provide an experience comparable to real voice productions?
Monday, February 16, 2009
Microsoft Recite Preview - Note Dictation and Voice Search
Arstechnica reports today on the release of Microsoft Recite "Technology Preview" for Windows Mobile. The applications lets users record short notes as audio snippets, which can later be searched for content by speaking key words. Apparently it does not entail speech recognition rather than simpler pattern matching, meaning it cannot be searched in text form but may work more robustly, eliminating the effort of training for speaker-independency.
While not a full product yet, this sounds like a nifty little application for cognitive off-loading.
Have you tried Microsoft Recite?
Sunday, February 8, 2009
More speech on the iPhone
The iPhone has proved a game-changer in many regards and speech is no exception. Both Google and Yahoo (with vlingo) have deployed mobile speech applications for the iPhone.
Today I came across another sighting of iPhone speech recognition, Vocalia by Creaceed, employing open-source ASR engine Julius for back-end technology. There is no "push to talk" button but a "shake to retry", which may prove useful when recognition goes awry. The app supports French, English and German for now and costs €2.99. Dictation is not available at this point, though Julius is certainly capable of it from an architecture point of view.
Other speech and language related iPhone apps:,
- Google Mobile - voice search app
- Vlingo - speech-enables your phone
- Pocket - language learning app
- Voice Dial - speech-enabled dialer
- VoiceThis - speech-enabled dialer
- iSpeak - multi-language translator with synthesized output
- A stuttering aid (not yet available)
Has anyone used these extensively? What is your experience with speech on the iPhone?
Monday, February 2, 2009
Zumba Lumba - iPhone killer or simply a hoax?
A no-frills phone with the unlikely name of Zumba Lumba has recently received some attention by the BBC. The phone is said to be top-secret, developed by a defense-aviation company. It does without frills like a camera or an applications platform, but touts some interesting security and computational features, (not only) related to speech technology:
- Cloud computing - the phone uses no local storage for contacts, data.
- Network speech recognition - user input is recognized over the internet. This should avoid hardware intensive local computing for voice input, but requires internet access.
- Voice identification - enhanced security, because the phone will only respond to a single user's voice.
Either way, the idea of joining mobile with cloud computing is interesting. Using voice identification for security has its appeal as well, even if it's unclear whether keeping data in the cloud and sending voice data over the internet is any more secure than simply keeping data on your phone, locally.
