Get 10% off SpeechTEK registration
Are you a speech geek? Do you dream about grammar development? Does your license plate say “MRCP”?
Will you be attending SpeechTEK 2010, the annual gathering and expo of the industry’s leading vendors and service providers? It happens August 2-4 in NYC.
You can receive 10% off when you use discount code ELIU during SpeechTEK registration.
Nu Echo releases NuGram Server Free Developer Edition and NuGram IDE 2.2
A busy day for Nu Echo, a leading speech recognition grammar development platform company based in Montreal, Canada. The company announces the availability of its NuGram Server Free Developer Edition:
One of the revolutionary features of the NuGram Platform is the ability to develop dynamic grammarsjust as easily as static grammars, using the same powerful environment and set of tools, and to deploy them as simply as JSP pages. This means that there is no longer any need for the traditionally complex, error prone, and difficult to test approaches for developing dynamic grammars. Until now, however, developers could not easily experiment with the dynamic grammar features of the NuGram Platform since, in order to do so, they were required to purchase a license of NuGram Server. With the introduction of a Free Developer Edition, this is no longer the case.
And version 2.2 of NuGram IDE:
The most important feature introduced in this release is the support for Java to populate dynamic grammars. When using NuGram IDE, you use the exact same code to test and tune your grammar that will run in production, but without the long deployment cycle associated with stopping, deploying and restarting a Java web application. And deploying your grammars in NuGram Server is as simple as deploying JSP pages.
Speech geeks, go forth and frolic in the merry ways of grammar development!
Speech recognition in… a movie?!
The whole point of speech recognition is to enhance interaction. People prefer to speak rather than press buttons repeatedly. The goal is to achieve a better customer experience.
Now think about another popular experience: going to the movie theater. How would you rate that experience? First of all, it is quite passive — the audience watches the screen, having no control to what’s being shown. And secondly, the experience lacks interaction — the audience sits through the duration of the film, just staring at what’s in front of them.
But some creative horror movie maker has taken the genre to another level by employing speech recognition software into the movie theater experience. The product is Last Call, the first interactive horror film:
Last Call is the first interactive horror movie in the world where the audience is able to communicate with the protagonist. A film controlled by a member of the audience, thus blurring the boundaries between game and film. Language recognition software transforms the participant’s answers via mobile phone into specific instructions. A specially developed software then processes these commands and launches an appropriate follow-up scene. The dialogue between the movie’s main actress and an audience member leads to a different film – and outcome – every time: sometimes with a happy end, sometimes with a more gruesome one. To participate in the adventure, audience members submit their mobile phone numbers to a speed dial code when they buy their ticket. The moment the female protagonist takes out her phone to call someone who might be able to help her, the film’s controlling software contacts one of the submitted mobile phone numbers. Once the viewer picks up, he hears the actress’s voice – who tells him she would be lost without him. He has to help her escape by choosing a path through the old, rundown sanatorium. Furthermore, he also decides whether she should help other victims to flee the scene -and every single choice shapes her fate: it’s a matter of life and death.
Look at the reaction from the audience. Does your speech-enabled IVR elicit such a strong positive response from callers? Probably not, because most of the time speech recognition is thought to be just a replacement for touch-tone. That type of thinking greatly limits the potential of a speech IVR. Instead, you should do away with your contact center manager hat and find your inner movie maker.
Categories: Implementation Tags: speech
Loquendo’s mobile TTS and ASR offering now complete
With mobile devices becoming more powerful every day, they are destined to get some serious speech applications. Just do a search for iPhone speech applications. Turin, Italy based Loquendo couldn’t have released its mobile TTS and ASR platform at a better time.
And the company means business in the mobile market, too. Support for iPhone? Check. Support for Android? Check. Support for Maemo (open sourced from Nokia)? Whatever that is, check. Support for Moblin (Intel backed mobile OS)? Yep, check. Support for Android (Google open source, as we all know)? Check! With the exception of the iPhone, the rest are all open source Linux-based operating systems so understandably Loquendo could easily come out with its product to support all of them. (The iPhone OS is based on Mac OS X, and although not open source it still has some Unix lineage.)
I look forward to the day when my mobile phone can serve as an IVR…
Official press release:
Loquendo, leading speech technology provider worldwide, announces that Loquendo Embedded Technologies – ASR and TTS – are now available for OEMs and developers of multimedia applications on the Android, Maemo and Moblin software platforms.
Android is the first truly open and comprehensive platform for mobile devices. Maemo is a software platform mostly based on open source code. Moblin is an open source operating system optimized for the next generation of mobile devices.
Android is available under a developer-friendly open-source license, which gives mobile operators and device manufacturers the freedom and flexibility to design innovative and exciting products. Recent arrivals to the market include Motorola’s Droid, HTC’s Nexus One, and the soon to be released Sony Ericsson Xperia X10.
According to IDC, shipments of handsets with the Android OS will reach 68m units by 2013, second only to Symbian. Gartner also forecasts that Android, by 2012, will rank second behind the Symbian OS.
Loquendo TTS and ASR seamlessly integrate with the Android platform, offering Java-level interfaces to developers.
Moreover, Loquendo TTS has been integrated into the Text-To-Speech Extended framework: this interface, once installed, makes Loquendo synthetic speech available to any Android app, allowing Android phone users to immediately upgrade to high quality TTS.
On Android, the TTS interface is very simple at the API level, and all functionalities are controlled through Speech Synthesis Markup Language (SSML) tags. By offering a fully-fledged SSML implementation, Loquendo gives application developers full control over its TTS features.
The Maemo platform is built on large parts of open source components, and was developed by Nokia in collaboration with many open source projects such as the Linux kernel, Debian, GNOME, and many more. The Maemo SDK provides an open development environment for applications on top of the Maemo platform. Maemo is based on the Linux operating system kernel – able to support a wide range of different kinds of devices from wrist watches to large server systems, making it ideally placed for the MID (Mobile Internet Devices) and netbook as well as smartphone markets.
With the availability of Loquendo technologies, Maemo developers will be able to unleash the potential of speech in developing voice-enabled apps.
The Moblin platform, short for ‘mobile Linux’, is built around the Intel Atom processor and is an open source operating system for MIDs, netbooks, nettops and embedded devices. The concept behind the Moblin project is to create an operating system specifically designed for netbooks and MID devices by minimizing both boot times and power consumption. The central piece of the Moblin architecture is a hardware and usage-model independent layer providing a single, uniform way of developing such devices. Moblin is based on the Linux kernel.
Early this month, Intel and Nokia announced the merging of Moblin and Maemo into the MeeGo mobile software platform, for which Loquendo will also offer full support.
Loquendo Embedded TTS and ASR are the ideal choice for speech-enabling mobile apps and services, including voice-enabled phones, navigation applications, MIDs, ebook readers, assistive devices, etc.
Loquendo TTS is natural, fluent and highly expressive synthetic speech, while Loquendo ASR is fast, accurate speech recognition even on large-vocabulary, natural-language speech. Both are high-performing, high quality technologies, however compact the device.
Whether on device side or server side, Loquendo offers the same extensive choice of languages and voices, regardless of the architectural solution, enabling service providers to guarantee a seamless service even in mixed environments – where voice content generation is shared across device and network.
Loquendo Embedded Technologies leverage Loquendo TTS mixed language capability, support the TeleAtlas® and Navteq™ SAMPA phonetic alphabets, and are available for all major embedded operating systems: Android, Maemo, Moblin, Linux, iPhone, Symbian OS™ S60, Windows Mobile 5 & 6 (all editions), CE 5 & 6, Windows XP Embedded and Tablet PC ed., VxWorks and QNX.
For more information, or for help and support with your application ideas, please contact Loquendo at: embedded@loquendo.com.
About Loquendo – Vocal Technology and Services
Awarded Speech Industry ‘Market Leader’ for the past three consecutive years, Loquendo provides a complete range of speech technologies for server, embedded and desktop solutions – in 28 languages with 68 voices, and constantly growing – helping businesses deliver a next-generation client experience while saving them millions each year.Loquendo Embedded Technologies are innovative, easy-to-integrate solutions deployed in more than 10 million mobile and on-board navigation systems globally, as well as powering PDAs, assistive devices, virtual Web-assistants and other embedded solutions around the world.
Loquendo TTS, Loquendo ASR, and Loquendo Speaker Identification and Verification are high-quality, high-performance technologies, also available on the Loquendo MRCP Server and VoiceXML and CCXML platform.
Loquendo is a Telecom Italia company headquartered in Turin, Italy, with offices in the US, Spain, Germany and France, and a global network of partners.
For more info, and to hear Loquendo TTS for yourself, go to www.loquendo.com.
What Tellme (Microsoft) is up to
It was big news when Microsoft scooped up Tellme in 2007 for a rumored $800 million. Not only did the acquisition highlight the Redmond software giant’s foray into speech recognition technology, but also its willingness to pour money into modern UI research and development. Speech recognition, by any application, is just another way for the user to interact with a system. Unlike other speech recognition vendors, Tellme did not concentrate on contact center applications, but provides solutions in telephony, Web, and mobile applications. It’s no wonder that Microsoft, being a software shop, was interested.
Fast forward to 2010 and TMCnet has an interview with Grant Shirk, Director of Industry Solutions at Tellme Business Solutions, who shares some information on what the future holds for Tellme. Some tidbits of particular interest to me:
In addition to the emergence of distributed computing platforms for speech recognition, we expect to see more IVR services moving into the network as businesses seek the performance improvements and lower costs that on-demand platforms can provide. Virtualization of queuing and routing is a logical next step that can drive higher agent utilization within the contact center and reduce the cost and necessity of standalone CTI services. Tellme expects this virtualization to also improve the customer experience by getting callers to the right agent at the right time (avoiding unnecessary transfers) and enabling the growth of innovative services like virtual hold and scheduled call backs.
It appears that all divisions within Microsoft are aligning to its Azure cloud platform, from Office to SQL Server to Tellme speech recognition. But is Microsoft playing catch-up? Google has had its cloud-based Apps forever. Voxeo and several other IVR vendors lead in cloud-based IVRs. And unfortunately for Tellme, virtual hold and scheduled call backs are old, old news…
But there’s hope:
Together with Ford Motor Company and Kia, Microsoft and the Tellme platform pioneered the use of network-based speech to drive in-car experiences with the Ford SYNC product, and Kia with UVO (your voice), both profiled at CES and SpeechTEK 2009. The Ford SYNC service accesses the Tellme platform to provide drivers with hands-free access to local business search, driving directions, and other information. We expect to see more manufacturers moving toward a network-based model in the near future.
In addition, speech is quickly becoming an integral part of the mobile device interface. A great example that showcases the power of speech and language processing technologies is the recently launched Bing Mobile client. To provide mobile users the best possible speech performance for these advanced tasks, the speech features need to take advantage of network-based (rather than embedded) recognition capabilities.
Now this is what I think Microsoft Tellme will be the leader of the pack. Ford SYNC is one of the features that distinguishes Ford automobiles from its competitors. Considering how Ford has come back to be profitable again, perhaps it also has Microsoft to thank. Although some geeks have joked that they’d rather not encounter the BSOD in their cars, SYNC is clearly something that car manufacturers believe in, and we should not be surprised to see such in-car voice-activated driver assistance platforms to be ubiquitous in the near future.
The future seems bright:
Tellme continues to drive momentum and significant interest in the adoption of the Microsoft speech engine. In 2009, we answered over 1.3 billion calls (nearly 50 percent of our total annual traffic) on this engine, and our clients are observing significant improvements in recognition accuracy, automation, and task completion across their applications. The average task completion improvement when moving to the new engine has been three percent on average, out of the box.
The close relationship between the delivery and R&D teams within the Speech at Microsoft group allows us to continually influence the evolution and enhancement of the speech engine to best meet the evolving needs of our customers.
The key is the close relationship with R&D. If anything, Microsoft is a gigantic R&D machine: consistently spends over $6 billion annually since 2005. With that type of R&D backing, it’s all but certain that Tellme will continue to have a major impact in speech recognition technology in general, not limited to just contact center applications.
Categories: Development Tags: acquisition, microsoft, speech, tellme
