"Watch What I Do" Chapter 24


Using Voice Input to Disambiguate Intent

Alan Turransky


Programming by demonstration systems use mouse input and the point and click paradigm as the primary form of user interaction. During a demonstration, multiple, valid interpretations of a user's actions can be made. A major problem in programming by demonstration is being able to disambiguate which of these interpretations the user had intended. Most systems use a fixed model for determining the most "plausible inference," but unfortunately there is no guaranteed way to either identify the user's intent correctly or be able to resolve the ambiguity without further assistance [Cypher 91b].

Part of the problem stems from the limited amount of information the mouse can convey. The point and click style of interaction restricts the user's ability to successfully demonstrate their intent, which in turn causes ambiguity. As a result, other means for specifying detailed information, such as dialog boxes, pop-up menus, gravity points, grids, and keyboard input, have been invented. While these alternatives offer a solution to the problem, their use is often unnatural. Keyboard input requires that the user put down the mouse in order to type. Dialog boxes interrupt the user's concentration by forcing them to switch back and forth between different modes of interaction. To make matters worse, using these secondary methods can be just as ambiguous. For example, when using a grid to align two objects, should a system infer that the objects are positioned at grid unit (X, Y), or aligned relative to one another? If a menu command is chosen, does the system record the action of selecting the operation as part of the demonstration?

Having recognized this problem, some systems will employ interactive techniques such as snap-dragging [Bier 86] and semantic gravity points [Lieberman 92b] to disambiguate mouse input, instead of using a secondary method. Interaction techniques which can be used while the mouse action is taking place may provide a more natural and effective solution, since they allow the user to indicate intent in a way which does not disrupt them from their primary task.

One such technique is voice input. This chapter highlights an experimental extension to the Mondrian system described in Chapter 16. It explores the potential of voice input as a convenient means for disambiguating intent by allowing users to control how the system interprets their mouse actions.

back to ... Table of Contents Watch What I Do