Understanding Speech Recognition

This section provides information about how Vocera speech recognition works, and how recognition results are calculated.

Vocera Grammars

When a user issues a verbal command to the Genie or responds to a question the Genie asks, Vocera attempts to process the utterance by finding a match in the Vocera grammars, which include the following components:

The grammars vary according to the user issuing the command and the site the user is calling. That is, because each site has its own grammar, and each user has a personal grammar, the actual grammars used for recognition are likely to be slightly different for any individual making a call.

How Speech Recognition Rates Are Calculated

Speech recognition rates are calculated based on the number of recognized speech attempts from total number of speech attempts:

Recognition Rate = Number of Recognized Speech Attempts / Number of Speech Attempts

If a Vocera user says a command and the Genie processes the utterance successfully, that counts as recognized speech. On the other hand, if the Genie is unable to process an utterance, it counts as unrecognized speech.

There are many things Vocera users can do to improve speech recognition. For example, users can train the Genie to understand the way they say particular names or commands. The “Troubleshooting Speech Recognition” chapter of the Vocera Administration Guide provides more information about how to solve speech recognition problems.

Recognition Result Categories

The following table describes different categories of recognition results:

Table 1. Speech recognition result categories



Recognized Speech was received and processed by the Genie.
Rejected Speech was received but it diverged too much from what the recognition engine expected, so it was rejected. Factors that contribute to rejection include:
  • Getting interrupted by someone (side conversation)
  • Using the wrong command
  • Saying an invalid response to a VMItext message
  • Incorrectly stating a user, group, or address book entry (for example, the user says “Call Doctor Smith” when “Robert Smith” is the correct name in the database)
  • Pausing, continuing other conversations while giving Genie commands, stumbling
  • Incorrect position of the device on the lanyard or universal clip
  • Recognition errors caused by wireless infrastructure issues, heavy accents, carts rolling by during commands or responses, areas with echo such as stairwells, noisy area, wind picked up by headsets with booms
Others Speech was received, but the Vocera system was unable to process it. This can happen if the duration of the speech exceeds the system's ability to interpret it, or if the speech started earlier than the Genie prompt.
No Speech-Occurrence Speech was not received, or was received too late for the Genie to process it. Examples include:
  • The Call button was pushed inadvertently
  • The caller pressed the Call button, the Genie responded, but the caller did not interact with the Genie before the no speech timeout period.
  • The caller said the correct response or command but the packet got lost between the device and the server due to wireless/wired network problems
  • The caller was interrupted by someone and did not respond to the Genie soon enough
Note: No speech results reflect a no speech timeout when the Call button is pressed. If the user presses the call button and no speech is perceived by the Genie, the call is ended after three attempts to prompt the user for a command.

Recognition results categorized as Rejected or Others reduce the overall speech recognition rate.

Note: When device users respond to Genie prompts, they can press the Call button to signify “Yes” or the DND button to signify “No.” Button responses, although they significantly improve task completion and overall efficiency, are not treated as speech recognitions and therefore are not included in the speech recognition reports.

How to Use Speech Recognition Reports

The speech recognition reports are best used as a broad indicator of speech recognition results, that is, how successful the system is in processing speech attempts. The reports highlight recognition rates for particular users, devices, or access points. You can generate recognition data with specific high or low percentages using the details reports. These reports provided insight to deployment speech recognition quality, allowing you to identify and troubleshoot problems.

Recognition results can be filtered to obtain specific detail in the following reports:

For more information about generating reports with recognition filters see Filtering Recognition Results.

Because the speech recognition reports do not present information about the recognition accuracy of in-grammar utterances, they should not be used to judge overall speech recognition accuracy and user satisfaction.

Unsupervised versus Supervised Speech Recognition

Speech recognition reports provide statistical calculations of speech recognition results based on logs generated on the server. These statistical results are a measure of unsupervised speech recognition, which is the standard method for reporting speech recognition results. In other words, the results reflect how successfully the system has been able to process speech without determining the accuracy of the recognition.

Although unsupervised speech recognition is the standard way to report recognition results, and it is the method used by several other reporting products, it cannot provide details of the overall accuracy of the recognition engine. The only way to determine overall accuracy of speech recognition in the Vocera system is to perform supervised speech recognition. Supervised speech recognition involves monitoring and transcribing utterances from end users and comparing the manual transcription against the result of each utterance in the call log. Vocera Professional Services can help you perform a supervised speech recognition analysis of your Vocera system to optimize recognition performance using several techniques, including dictionary tuning, multiword modeling, and recognition parameter tuning.

In-Grammar versus Out-Of-Grammar Speech

An in-grammar utterance is any utterance that conforms to the Vocera grammars (the combination of the static set of commands, the dynamic grammar for different sites, and the personal grammar of individual users). All other utterances are considered out-of-grammar.

In-grammar utterances have a much higher chance of being recognized accurately. To determine whether utterances were in-grammar or out-of-grammar, supervised speech recognition must be performed.

The system can process in-grammar utterances in the following ways:

When an out-of-grammar utterance approximates the grammar but does not match it exactly, the recognition engine may still accept the utterance. For example, a user may say the command “Call Kathy Turner” but “Kathleen Turner” is the name in the database. Although the utterance is out-of-grammar, it may still be accepted. Similarly, the utterance “Hold my calls” is out-of-grammar because the in-grammar commands are “Block all calls” or “Hold all calls.” Still, the utterance is close enough to the grammar that it may be accepted.

In-Grammar Recognition Examples

This section provides a few examples of how in-grammar utterances are recognized by the system, without implying anything about overall recognition accuracy.