Getting Started |
This section provides information about how Vocera speech recognition works, and how recognition results are calculated.
When a user issues a verbal command to the Genie or responds to a question the Genie asks, Vocera attempts to process the utterance by finding a match in the Vocera grammars, which include the following components:
A static grammar, which includes commands such as "Call" and "Broadcast" as well as possible responses such as "Yes" and "No", digits such as "One" and "Two", and so forth. The static grammar is installed by Vocera and cannot be changed by a customer.
A dynamic grammar, which includes all the spoken names a user can possibly utter. The dynamic grammar includes the names of users, groups, sites, locations, address book entries, and all their possible alternates, such as spellings of user names and the singular and plural names of groups.
Each site has its own dynamic grammar. It is completely determined by values that you enter in the database.
A personal grammar, which includes the buddies of an individual user, as well as any personal learned names and learned commands. Each user has his or her own personal grammar.
The grammars vary according to the user issuing the command and the site the user is calling. That is, because each site has its own grammar, and each user has a personal grammar, the actual grammars used for recognition are likely to be slightly different for any individual making a call.
Speech recognition rates are calculated based on the number of recognized speech attempts from total number of speech attempts:
Recognition Rate = Number of Recognized Speech Attempts / Number of Speech Attempts
If a Vocera user says a command and the Genie processes the utterance successfully, that counts as recognized speech. On the other hand, if the Genie is unable to process an utterance, it counts as unrecognized speech.
There are many things Vocera users can do to improve speech recognition. For example, users can train the Genie to understand the way they say particular names or commands. The “Troubleshooting Speech Recognition” chapter of the Vocera Administration Guide provides more information about how to solve speech recognition problems.
The following table describes different categories of recognition results:
Category |
Description |
---|---|
Recognized | Speech was received and processed by the Genie. |
Rejected | Speech was received but it diverged too much from what the
recognition engine expected, so it was rejected. Factors that
contribute to rejection include:
|
Others | Speech was received, but the Vocera system was unable to process it. This can happen if the duration of the speech exceeds the system's ability to interpret it, or if the speech started earlier than the Genie prompt. |
No Speech-Occurrence | Speech was not received, or was received too late for the Genie
to process it. Examples include:
Note: No speech results reflect a no speech timeout when the Call
button is pressed. If the user presses the call button and no
speech is perceived by the Genie, the call is ended after three
attempts to prompt the user for a command.
|
Recognition results categorized as Rejected or Others reduce the overall speech recognition rate.
The speech recognition reports are best used as a broad indicator of speech recognition results, that is, how successful the system is in processing speech attempts. The reports highlight recognition rates for particular users, devices, or access points. You can generate recognition data with specific high or low percentages using the details reports. These reports provided insight to deployment speech recognition quality, allowing you to identify and troubleshoot problems.
Recognition results can be filtered to obtain specific detail in the following reports:
For more information about generating reports with recognition filters see Filtering Recognition Results.
Because the speech recognition reports do not present information about the recognition accuracy of in-grammar utterances, they should not be used to judge overall speech recognition accuracy and user satisfaction.
Speech recognition reports provide statistical calculations of speech recognition results based on logs generated on the server. These statistical results are a measure of unsupervised speech recognition, which is the standard method for reporting speech recognition results. In other words, the results reflect how successfully the system has been able to process speech without determining the accuracy of the recognition.
Although unsupervised speech recognition is the standard way to report recognition results, and it is the method used by several other reporting products, it cannot provide details of the overall accuracy of the recognition engine. The only way to determine overall accuracy of speech recognition in the Vocera system is to perform supervised speech recognition. Supervised speech recognition involves monitoring and transcribing utterances from end users and comparing the manual transcription against the result of each utterance in the call log. Vocera Professional Services can help you perform a supervised speech recognition analysis of your Vocera system to optimize recognition performance using several techniques, including dictionary tuning, multiword modeling, and recognition parameter tuning.
An in-grammar utterance is any utterance that conforms to the Vocera grammars (the combination of the static set of commands, the dynamic grammar for different sites, and the personal grammar of individual users). All other utterances are considered out-of-grammar.
In-grammar utterances have a much higher chance of being recognized accurately. To determine whether utterances were in-grammar or out-of-grammar, supervised speech recognition must be performed.
The system can process in-grammar utterances in the following ways:
Correct Acceptance—The Genie has correctly recognized the utterance.
False Acceptance—The Genie has falsely recognized the utterance as something else that is in the Vocera grammars.
False Rejection—The Genie has falsely rejected the utterance.
When an out-of-grammar utterance approximates the grammar but does not match it exactly, the recognition engine may still accept the utterance. For example, a user may say the command “Call Kathy Turner” but “Kathleen Turner” is the name in the database. Although the utterance is out-of-grammar, it may still be accepted. Similarly, the utterance “Hold my calls” is out-of-grammar because the in-grammar commands are “Block all calls” or “Hold all calls.” Still, the utterance is close enough to the grammar that it may be accepted.
This section provides a few examples of how in-grammar utterances are recognized by the system, without implying anything about overall recognition accuracy.
Correct Acceptance—When a user says a valid Vocera command, the sequence goes like this:
Genie: Vocera. User: Call Amy Wilson. Genie: Finding Amy Wilson.
In the above example, the user's utterance was both recognized and accurate.
False Acceptance—Sometimes when an utterance is recognized by the system, it may be recognized incorrectly. For example, consider the following sequence of speech attempts with the Vocera Genie:
Genie: Vocera. User: Call Ravi Bashir. Genie: Did you mean Ron Bishop? User: No. Genie: OK, let's try again. Vocera. User: CALL RAVI BASHIR! Genie: Did you mean Ron Bishop? User: NO!!
In the above example, all four user utterances were recognized by the system. However, the first and third utterances (“Call Ravi Bashir”) were not recognized accurately, although they count as recognized utterances in the speech recognition reports.
False Rejection—An in-grammar utterance could be rejected incorrectly, as shown in this example:
Genie: Vocera. User: Call Al Renda. Genie: I'm sorry. I do not understand.
In the above example, the name "Al Renda" is in the database, but the Genie was unable to process it, perhaps because there was loud background noise.