Speak Freely

I began blogging in January 2013 as a new years resolution. During that first month of blogging, I wrote brief essays on the power of the keyboard and mouse in human computer interaction models:

Optimizing the keyboard

Optimizing the mouse

The resurgence of the command line UI

At the mHealth Summit, I recently had a chance to play with Nuance's and VoiceFirst's (by Honeywell) latest voice solutions. I was thoroughly impressed. The key function that caught my attention was that the Nuance app was parsing a single block of text into multiple functions. An example:

From the patient selection screen I said "order 500mg Levaquin for John Smith."

Nuance would first recognize that John Smith is admitted and in the current patient list. Next, it opened John Smith's chart. Then it displayed a new screen with the order details of the pending order on the top half of the screen and a listing of existing orders and allergies on the bottom half of the screen. Lastly, it prompted the provider to fill in the rest of the required fields - route, frequency, etc.

In a separate demo, the Nuance rep showed me voice-print based authentication, aka logging in with your voice. If you combine the two above, doctors wouldn't have to sign orders so long as they spoke them. The EMR would know who placed the order based on voice, and the order would be authenticated against one's voice. Awesome.

The point of this post isn't to praise Nuance. It's to postulate on the future of voice based interfaces in medicine.

Voice is an interesting beast. Designing UXs that heavily incorporate voice can significantly alter UI design. For example, voice can help solve the 'there's too many buttons on the screen' problem. Just get rid of the buttons. If you don't want tabs for labs, vitals, allergies, meds, immunizations, etc, then get rid of all of them and make them accessible via voice.

That's a powerful concept. With voice, UIs don't have to be exclusively bound by the pixels on the screen. There are still some pixel based limits, but voice can virtually extend pixels. So long as the voice command is contextual and intuitive for the user, buttons can be removed.

Voice also opens incredible new opportunities for the patient to enter data him- or herself without even knowing it. So long as the provider guides the patient and asks the right questions, the patient could fill out the EMR as they speak. See the example below.

EMR.png

In this example, the provider could prompt the patient "So tell me a bit more about your [complaint]. How long have you had it? Where does it hurt? Any associated symptoms?" The NLP could pick up that this is obviously referring to HPI, and then dictate the patient's response into the HPI field. CC and HPI are supposed to be patient reported anyways.

Looking at the two examples above, it's clear that voice will be a highly contextual UI concept. Although this is intrinsically true in visual UIs (keyboard / mouse and touchscreen), it's worth repeating in a voice-driven UX because voice doesn't appear to be bounded by what the user can see. For developers, this means a few things:

Don't try to design voice commands to be generic across the entire application. And don't try to show available voice commands on the screen. Assume users learn to use the voice queues over time. Train them and provide subtle visual cues to encourage them to use voice. Help them explore.

Try to really understand context. Context was previously bounded by screen real estate. It no longer is. For example, while looking at a given patient's labs, the user may want to jump to meds. Don't force the user back up the tree to the dashboard before they can navigate to meds. (Note, most EMRs aren't architected to support this. Many are intrinsically tree-based. In these instances, they'd need to simulate two steps programmatically).

Use voice consistently. Although voice can remove the need for certain buttons / tabs, it should be available for navigational elements that are on screen. If one navigation item can be selected to via voice, all comparable voice items should be.

All voice commands and triggers should be one or two words. Given that voice is 90-95% accurate per word, if 1/10 voice commands fail, that's probably ok. Users won't mind repeating one or two words (but they will be frustrated repeating two sentences).

If your company is doing awesome stuff with voice, please let me know. I want to learn about it.