Glass Insights: Input-Output

This post was originally featured on the Pristine blog.

This is the first in a series of posts that will illustrate how Glass is different from all of its computing predecessors: PCs, smartphones, and tablets. This series will cover every aspect of developing for Glass: programming and technical details, UX and ergonomics, use cases, and more. Glass is a unique platform, and everyone is still trying to understand the nuances of the strengths, limitations, and opportunities. We'd like to contribute to that open and ongoing conversation.

To kick things off, we're going to discuss what is perhaps the most fundamental aspect of Glass: how the user interacts with the device.

Glass, like any computer, requires input, and delivers some output. The input options for Glass are extremely limited:

1. Trackpad on the side

2. Voice

3. Accelerometer/gyro

4. Camera

5. Winking (hacked only; not supported out of the box)

Glass is not conducive towards interactivity. The more interactive the application, the less desirable it will be to use. Physically "using" Glass is simply a pain. Try connecting to Wi-Fi, and you'll know exactly what I mean.

Why is Glass so painful to use? Swiping along a trackpad on the side of your head is unnatural. In the pre-Glass era, how many times did you rub your temple?

But what about voice? Google's voice-to-text technology is by all accounts the best in the world; most technology enthusiasts and bloggers agree that it's quite accurate. The problem with relying on voice - especially for any command longer than 2-3 words - is that the opportunities for error multiply exponentially. Every word is a potential point of failure. If Google's transcription service messes up one word, the entire command can be rendered effectively useless (that's why Siri attempts to account for transcription errors). When a voice command is rendered useless, it takes at least a few seconds to reset and try again depending on the exact context. Per Google's Glass development guidelines, content on Glass must be timely. One of the defining characteristics of Glass is that you don't have to spend 5 seconds to reach into your pocket and unlock your phone. If it takes longer than 5 seconds to initiate an action, then you might as well have pulled your smartphone out of your pocket. There are exceptions - surgeons in the OR wouldn't be able to use their hands - but generally speaking, failing a voice command means that you could've and should've used your phone instead.

The accelerometer and gyro are useful to wake Glass from sleep, but I'm having a tough time visualizing apps making use of those functions for any form of meaningful engagement. The human neck isn't design to move and bend all that much; accelerometer and gyro based movements need to have significant triggers. Glass defaults to 30 degree head tilts to wake from sleep to prevent ambient waking. Developers can use the accelerator and gyro, but they must do so conservatively. They cannot be used interactively.

The camera provides by far the most raw input data, and thus holds the most potential. However, given Glass's screen size, positioning relative to the human eye, and the challenges of implementing intelligent, dynamic object recognition, the camera is probably a long ways off from becoming the defining input mechanism for most Glass apps. Trulia's real estate Glass app uses the camera in conjunction with GPS and the compass to show you data about the real estate you're looking at. This is a methodology I'm sure dozens of other apps will employ: using camera + GPS + compass to overlay data from a database. However, because of Glass's screen size and positioning relative to the eye, the camera can't deliver a lot of interactivity. It can feed lots of data to a database in the cloud, but it can't provide for interactive apps, yet.

Winking will provide for lots of fun apps. It's a great trigger event. Coupled with other types of context - voice, camera, and location - it could provide for some unique forms of interactivity, though I'm not exactly sure what they'd look like. No matter the app, I don't think anyone wants to wink all day.

In the near term, voice will be the most compelling and useful input mechanism. Most commands will be brief to reduce chance of failure. Glass devs that are hacking Glass to run native Android APKs are already using voice to navigate their apps. We are too. It works quite well. "Next", "previous", and "lookup [x]" work 99% of the time in reasonably controlled environments. At a bar, forget about it, but in a clinic or hospital, even with people talking nearby, voice is a compelling input.

Longer term, I expect the camera to become the most powerful input. It provides such an incredible amount of context and data. If Google decides to implement a larger screen that's more aligned with the human eye, the camera could become the defining input mechanism for most apps. Meta-View and AtheerLabs are already working on that dream. We'll see if Google decides to go that route. Given that Google's been position Glass as a consumer device, I have my doubts, but perhaps they'll shift in Google's product strategy, or an eyeware computing hardware portfolio for different use cases.

The other very promising input mechanism for Glass is the MYO Armband. It can mitigate one of Glass's greatest weaknesses: limited inputs. MYO delivers a very elegant input solution that complements two of Glass's three unique traits: hands free and always there. We're excited to integrate the MYO armband into Pristine's apps.