Android Development & Technology
  • Speech Recognition in Android SDK 1.5

    Posted on April 14th, 2009 chris 3 comments
    The preview release of Android SDK 1.5 has generated a lot of interest, with one rather unnoticed but very exciting feature being speech recognition. In this post we're having a closer look at what the library offers and and how to use it. As noted in the Android 1.5 highlights, speech recognition is done via the RecognizerIntent. Intents link different Activities together, allowing one Activity to start another and to receive it's response. In working code, starting an Activity with an intent and waiting for the result looks like this:
    Intent intent = new Intent("android.speech.action.RECOGNIZE_SPEECH");  
    startActivityForResult(intent, 0);
    startActivityForResult launches an Activity and returns the result. While speech recognition is being performed, it displays an (custom) overlay over your app, and when done returns the results back for your activity to handle (see code below). Two approaches (ACTIONS) are intended for the recognizer:
    1. ACTION_RECOGNIZE_SPEECH, which starts an activity to recognize speech and sends the results back to the Activity
    2. ACTION_WEB_SEARCH which does the recognition, but displays web-search results in a browser instead of passing the results back
    A couple of settings for the recognition engine are mentioned in the docs:
    1. EXTRA_LANGUAGE is optional and used to override the default language setting
    2. EXTRA_LANGUAGE_MODEL is used to set the respective speech- and language-model, which may either be free-form (general purpose model) or web-search (model based on web search terms).
    3. EXTRA_MAX_RESULTS is an optional limit on how many results to return
    4. EXTRA_PROMPT optionally displays a text in the recognition-activity during the process
    5. EXTRA_RESULTS supplies an ArrayList of the potential results when performing recognition (a.k.a. dictionary)
    Most activities will use ACTION_RECOGNIZE_SPEECH with a set of potential results, and a free-form language model. It is uncertain if the dictionary size is be 10 or 100 or 500 words, as well as the robustness to noise and speakers of different ages. I look forward to test the performance of the speech recognition on real device. The company Nuance, member of the Open Handset Alliance, is contributing with it's vast experience in voice recognition to the system, and Google itself has some experience too, not lastly with transcribing voicemail messages.
    Have a look at IraqComm, a real-time, speech-to-speech translation systems in mass-production, and robots such as SIG II, which are able to unterstand up to three people simultanously... in a cafeteria noise setting of 65dBA (paper, image right). IBM's embedded ViaVoice, built into various navigation systems, is able to understand between 200.000 and 1 million spoken words. The technology for everyday use is just around the corner. And the android.speech API is one step in that direction! I'm convinced the Android community will think of many creative ways to implement speech recognition into apps, games, and even robots. The future starts now! Update: The guys at androidandme have published an app and an article evaluating the speech recognition feature. The voice is first recorded and sent to Google, which then sends the transcription back to the phone, hence this feature is not real-time and requires a good network connection, but gets pretty good results that way.

    Fatal error: Call to undefined function wpx_whatsNext() in /var/www/4feets.com/htdocs/wp-content/plugins/exec-php/includes/runtime.php(42) : eval()'d code on line 44