back to ... Table of Contents Watch What I Do


Chapter
9

Eager:
Programming Repetitive Tasks by Demonstration[ ]

Allen Cypher

Introduction

A significant problem with today's personal computers is that users have to perform many repetitive tasks by hand. If my department decides to have a meeting every Monday, I have to go through my calendar, week by week, and paste the meeting into each Monday. If I decide to reformat a bibliography to use double-quotes instead of italics, I have to change each entry by hand. Although computers are supposed to excel at performing repetitive tasks, most users are unable to take advantage of this capability because they do not know how to program.

Programming by Demonstration is a technique that can potentially solve this problem. Since by definition most of the steps in a repetitive task are the same as in the previous repetition, it should be possible for a computer program to automate the task by recording the steps and replaying them.

a) copy first subject
b) type "1. " and paste subject
e) type "2. " and paste subject
f) Eager appears
i) anticipate going to next message
j) user clicks on Eager

Figure 1. In this example, the user makes a list of the subjects of a stack of messages. The user selects (a) and copies the subject of the first message, goes to the list, types "1. " and pastes in the subject (b). Then the user navigates to the second message (c), selects (d) and copies its subject, goes to the list, types "2. ", and pastes in the second subject (e). After the user navigates to the third message, the Eager icon appears (f), indicating that Eager has detected a pattern in the user's actions. Eager does not immediately begin performing actions; rather, it uses green highlighting to "anticipate" what the user is going to do next. In (f), Eager is anticipating that the user will select the subject of the third message, which is correct. The user selects and copies the third subject, goes to the list, and Eager anticipates that the user will type "3. " (g). After the user does this, Eager anticipates that the user will paste (h). The user does the paste and then navigates to the fourth message, which is anticipated in (i). At this point, the user is confident that Eager can perform the task correctly and therefore clicks on the Eager icon (j). A dialog box appears, and the user selects the option "Finish The Task" (k). Eager writes and executes a program which completes the task for the user, and automatically terminates (l).

c) go to next message
d) copy second subject
g) anticipate typing "3. "
h) anticipate paste
k) finish the task
l) Eager finishes

What Eager Does

Eager offers a solution to the iteration problem for Programming by Demonstration -- the problem of specifying loops by demonstration. Eager is always on, constantly looking for repetitions in actions performed by the user. When it detects an iterative pattern, the "Eager" icon pops up on the screen. As the user continues to perform the task, Eager anticipates each next action by turning menu items, buttons, and text selections green. When Eager is correct, the user's intended selection will appear highlighted in green before it is selected. When Eager is incorrect, the user's choice will not match the highlighted item, and Eager will use the actual choice to revise its program. Once the user is confident that Eager knows how to perform the task correctly, he or she clicks on the Eager icon and Eager writes and executes a program which completes the task automatically. Figure 1 gives an example of Eager assisting a user with a repetitive task.

User Interaction

Minimal Intrusion

Eager differs from most other Programming by Demonstration systems in its interaction with the user. An important design goal was to intrude minimally on the user's activities. Therefore, Eager is written so that it never asks the user for information; the only information comes from recording the user's ongoing actions. In particular, the user does not signal the start of an example, and never explicitly confirms or rejects a hypothesis -- performing an action that matches the anticipation is implicitly confirming; performing any other action is implicitly rejecting.

Representation

Another important design decision was to represent actions in the same modality that the user employs for performing those actions. The Macintosh has a direct-manipulation interface, so Eager represents actions by highlighting objects, menu items, and text selections rather than by displaying textual descriptions, such as Copy the button located at 20,120.

Generalization

A significant design decision for all PBD systems concerns how generalizations will be presented to the user. Generalizations often represent abstract concepts which can be difficult to explain. In Eager, generalizations are communicated to the user through instantiations. For example, if the user selects button 1 and then selects button 2, Eager makes the generalization select button i for i = 1, 2, 3, ... and instantiates this on the next iteration to select button 3. The hope is that users will be able to recognize through this specific instance that Eager has detected the pattern in their actions, and that this specific instance will be easier for them to understand than an abstract description such as select button i.

Validating programs

When I use standard macro tools to record and play back a sequence of actions, I sometimes find that the macro does not perform as expected, and I am unable to automate my task. As a result, I will sometimes not even try to create macros when I am unsure of their success. Eager minimizes such failures because the Eager icon only appears on the screen when it is able to automate the previous steps. Since it is possible that the generalization Eager has created is not what the user intended, the user may check the correctness of Eager's program by observing its anticipations. In this way, validation coincides with continuing to perform the task by hand, and no effort is wasted.

How Eager Works

Eager was originally written in LISP on a microExplorer. It was then ported to Macintosh Common Lisp. The current implementation is written in C++ and runs as a background application on Macintosh computers. It receives information about high-level user events in other applications via AppleEvents. The HyperCard application was modified to send Eager detailed information about all user actions. The Resolve spreadsheet application was also modified to send user actions -- to show the viability of connecting Eager with a variety of applications -- but this modification was much less complete than the modification of HyperCard.

Pattern-matching

Whenever an application reports a new event, Eager searches through previous events in the history to find one that is similar. "Similar" means that 1) the commands are of the same type (e.g. they are both Copy Text commands), and 2) the objects on which the commands are performed fit into some regular pattern (e.g. Tuesday and Wednesday ; button #6 and button #7 ; button #6 and button #6 ; the last word on line 4 of field "Address" of card 2 of stack "Rolodex" and the last word on line 4 of field "Address" of card 3 of stack "Rolodex").

For integers, Eager searches for constants, consecutive integers, linear sequences, and linear sequences within a given tolerance (e.g. 50, 72, 91, ... is recognized as the sequence 20*i + 30, with a tolerance +/-2). Some of the numbers that HyperCard passes to Eager are: card numbers, screen coordinates for buttons, line numbers for text selections, and numbers as text and in card names.

For textual data, Eager parses the text into substrings of contiguous alphabetic characters, numbers, delimiters, and spaces. It searches for constants, alphabetic or numeric order, changes in capitalization and spacing, and known sequences such as days of the week and roman numerals. Some of the textual items that HyperCard passes to Eager are: text selections, pathnames, and the names of cards, buttons, and fields. If the user has selected some text and replaced it, Eager looks for patterns not only between the new text and the old text, but also between the new text in the current iteration and the new text in the previous iterations.

Loop detection

When two similar events are found, Eager assumes that the second event marks the beginning of the second iteration in the loop. All of the events before it, back to the first similar event, are presumed to constitute the first iteration in the loop. Eager now monitors each new incoming event to see if it is "similar" to the corresponding event from the first iteration. If patterns can be found for each pair of events, Eager concludes that it has detected an iterative loop, and the Eager icon pops up on the screen. Based on the generalizations formed from these two iterations, Eager instantiates the next steps in the pattern and directs the application to highlight the appropriate items in green.

In the example shown in Figure 1, the first two similar events are Copy words 2 through 3 ("Trial info") of line 1 of background field 1 of card 2 of stack "Cali:Eager Demo:Mail Messages" and Copy words 2 through 5 ("Some more good ideas") of line 1 of background field 1 of card 3 of stack "Cali:Eager Demo:Mail Messages". HyperCard sends Eager a considerable amount of ancillary contextual information. For instance, the information sent with the above events includes the facts that there are 3 words on line 1 of card 2, and 5 words on line 1 of card 3, which Eager uses to recognize that the selection is from the second through the last word of line 1.

When the user clicks on the Eager icon, the system writes a program in scripting language and sends it off for execution.

Domain Knowledge

The Eager system is able to detect patterns in applications because it deals with high-level events, and because it has knowledge about the structure of the applications it monitors. For example, instead of being given the low-level information that a mouse click occurred at location (28,142), it is given the high-level information that the click was on the button named "Phone". As an example of domain knowledge, Eager has a knowledge base about HyperCard objects which includes the fact that there is a last card in a stack, so Eager can infer that a loop that copies Card 1, Card 2, ..., should terminate when the last card in that stack is copied. Another example of domain knowledge is that the number called "button-location-X" is of type screen coordinate, so Eager uses a test for similarity that recognizes linear sequences of numbers within a tolerance of 4 pixels, to allow for inexactness in where users position objects on the screen.

Eager specifies a format for application developers to provide information about their commands and their objects. For each object, the developer can specify the test to use for similarity. For instance, the similarity test for HyperCard cards is (OR cardNumber cardName cardID). So if Eager is testing to see whether two Cut Card commands fit into an iterative pattern, it first looks for a pattern in the numbers of the cards. Since cardNumber is specified by the developer to be an integer, the default tests mentioned above for integers are used. If the developer had wanted cardNames to have higher priority than cardNumbers, (OR cardName cardNumber cardID) would have been specified instead.

For any command, the developer can specify other commands that it supersedes. When the user performs a command, Eager will then remove the previous command from the history if the current one supersedes it. In HyperCard, for instance, all navigation commands and all selection commands supersede any previous selection command. So, for instance, if the user selects a region of text, and then selects another region of text, the first selection is removed from the history.

Developers can also specify commands that "chain" together. For instance, there are many equivalent ways to move to a particular card in a HyperCard stack. There are commands to move to the next, previous, first, and last cards in a stack, to jump back to the card last visited, and to select from a miniature display of the 25 most recent cards. In order to detect patterns where users do not navigate in precisely the same way each time, the developer specifies that all of these navigation commands chain together. As a result, Eager will match any navigation sequences which have a pattern in their final destination, or in the relative number of cards moved.

It is useful to note that Apple's AppleEvent architecture allows a recorder to query an application both before and after the application performs a user action. This feature is included in the architecture specifically to meet the needs of "intelligent" recorders like Eager. So, for instance, if an application's Set Name command does not include important contextual information such as the previous name of the object, it is possible for the recorder to query the application for this information in between the time when the user has requested the name change and the time when the application actually performs the change.

User actions as mid-level events

There is not always an exact correspondence between user actions and high-level events -- there may well be several user actions involved in performing a single high-level event. For instance, the high-level event "Set the style of card button 5 to transparent" is performed by the user as a sequence of five actions: choosing the "button tool" from a menu, selecting button 5 by clicking on it, choosing "Button Info..." from a menu, selecting the "Transparent" style in the dialog box that appears, and clicking on the "OK" button in the dialog box.

Eager uses high-level events when it looks for patterns and user actions when it displays anticipation feedback. Larry Tesler coined the term "mid-level events" to refer to user actions. David Kosbie's chapter on aggregate events (Chapter 22) discusses this matter in greater detail.

User Study

In order to understand how first-time users react to Eager, seven subjects were given three repetitive tasks to perform. The subjects were not given any information about Eager -- they were simply asked to perform the three tasks.

The study showed that first-time users were generally able to understand what Eager was doing and to figure out how to use it without instruction. Three subjects clicked on the Eager icon as soon as it appeared on the screen. Three subjects noticed the icon, performed several more steps by hand, and then clicked on the icon. The verbal protocols for these subjects indicated that they were able to figure out that the anticipation highlighting indicated what they were going to do next: "...indicates that I've been using it [the Recent command] and it's probably the next thing I'm going to do"; "It's almost as if it's suspecting what I want to do. Now I get it". One subject performed all of the tasks by hand and never clicked on the icon.

When subjects were able to perform the task correctly in HyperCard (17 tasks out of 21), Eager was able to detect the patterns in their actions, even though different subjects chose different strategies for performing the tasks, and some subjects performed the task somewhat differently on each iteration. For instance, one subject originally used the "Next" button to go to the end of the stack, and on a later iteration switched to using the "Last" command. Also, some subjects made and corrected minor mistakes, such as navigating past the desired card and then backing up.

The study also pointed out significant failings in the user interface. The most striking finding was that all subjects were uncomfortable with giving up control when Eager took over. Two changes were made: Eager now saves a copy of any stacks it is going to modify so that the user can recover from an incorrect automation, and a stepping mode was added, where Eager pauses for user confirmation before each action.

There were a few cases where Eager appeared prematurely, having detected a simple but insignificant pattern. The appearance of the icon is now postponed until two complete iterations have been performed, and for extremely simple patterns, it waits for three iterations.

Some subjects did not realize that the character that popped up on the screen was related to the items being highlighted in green. To remedy this, the icon now appears in green as well.

The original icon for this program showed a man sitting in a chair. When he anticipated an iteration correctly, he would begin clapping. Some subjects expressed confusion about the character, since they could not see what the representation had to do with automating repetitive tasks. The icon was therefore changed to the less evocative image of a cat -- when it takes over, the cat is shown with its paw moving the mouse.

Some subjects did not notice the icon when it first appeared, particularly if it appeared on a rich background. The cat icon now animates when it first appears.

Limitations

A limitation in using anticipation to communicate an abstract program to the user is that anticipation is unable to communicate the termination conditions for the program. A user expecting a PBD system to delete all of the documents in the current folder would be shocked to observe it blithely deleting all of the documents on the disk! At present, Eager is conservative and terminates when it reaches the end of the first structure in the PartOf hierarchy. It would be reasonable to query the user about continuing the iteration to the next level in the hierarchy.

Eager currently does not handle conditionals, nested loops, repetitions that are spread out over time (e.g. inserting the date whenever you start a new letter), or tasks that include some steps that must be left up to the user. These restrictions help to constrain the sorts of patterns that the program must detect. If Eager's search fails to detect a match between an incoming event and the corresponding event in the previous iteration, it assumes that there is no repetitive task. In fact, the correct interpretation could be that there is a conditional branch in the program, or that there is an obscure pattern that must be left up to the user (e.g. deleting the names of people who are no longer friends). Allowing for these possibilities would greatly increase the ambiguity in generalizing and the complexity of pattern-matching. Eager's success is partly due to the fact that it only tries to automate a limited, yet useful, range of tasks. Although its power is limited, it does handle almost all of the commands in HyperCard, unlike all previous PBD systems which only work in limited, experimental domains.

Related Work

Various "macro" programs, such as Tempo and QuicKeys2 on the Apple Macintosh, are effective in automating simple tasks. However, they are limited in that 1) they only record low-level events, and 2) they cannot generalize. For example, if the user clicks on a MacWrite icon, a macro program will only record the low-level event of clicking at that particular location on the screen. If the MacWrite icon is moved, the macro will no longer work correctly. The inability of macro programs to generalize is illustrated when a user selects "William" in "Dear William" and "Mrs. Julie Kincaid" in "Dear Mrs. Julie Kincaid", and the macro program is unable to select "Ms. Atkins" in "Dear Ms. Atkins", because it cannot make the generalization all of the words following "Dear".

Programming by Demonstration (PBD) systems go beyond macro programs by making generalizations about high-level events. This allows them to automate more complex tasks. The work most closely related to Eager is David Maulsby's Metamouse (see Chapter 7), a PBD system for a simple Draw application. Like Eager, Metamouse watches user actions and writes a program which generalizes those actions. Metamouse is more powerful than Eager in that it infers conditionals, but more restrictive in that the user must explicitly indicate the start of recording, must answer questions about how to properly generalize various steps, and must approve or reject each system action.

Eager is also related to Brad Myers' Peridot, described in Chapter 6. Peridot is a PBD system for building user interaction devices, such as menus and scroll bars. After the user places the first few items of a list into a menu, Peridot is able to infer that the entire list should comprise the menu. Like Metamouse, Peridot requires the user to confirm each inference it makes. Peridot was inspirational to me in showing the potential for PBD as a practical tool for end users.

The Predictive Calculator (Chapter 3) is similar to Eager in that it constantly monitors the user's actions. Barbara Staudt Lerner's Lantern [Lerner 89] system for automated customization is similar to Eager in its concern with minimizing user interaction. Lantern adopts the radical approach of performing customizations without asking for the user's consent. It abandons a customization if it detects the user undoing its effects.

Future Directions

Further user studies will be conducted to determine whether the changes prompted by the first user study were in fact successful in correcting the various problems.

The first user study investigated users' first experiences with Eager. To complement this data, a field study will be conducted to learn more about regular, experienced use of Eager. Most importantly, the study will investigate how often users perform tasks which Eager can automate. The study will also investigate how often Eager's automations do not coincide with the user's intentions, and how users react in such situations. The field study should also be valuable for determining whether experienced users of Eager have needs which the current user interface does not address. Finally, it seems significant that three of the subjects in the first user study clicked on the Eager icon as soon as it appeared, without following the anticipation highlighting. The field study will be able to determine whether some users remain unaware of the highlighting feature even after extensive use of Eager, and whether some users persist in invoking Eager without first validating its intended actions.

It would be possible to apply Eager's interaction style to Intelligent Help. If Eager detected a sequence of commands for which it knew a shortcut, it could guide the user through the shorter procedure. If it detected a pattern of activity that typified a common error, it could guide the user through the correct procedure. In either case, users would see a general principle applied to their particular situation -- Eager would do the instantiation, and they would not be required to puzzle over an abstract textual description or an example in an unfamiliar domain.

Summary

Eager demonstrates that Programming by Demonstration can be applied practically to current programs. It presents a solution to the problem of specifying iteration by demonstration.

The technique of anticipation is important in meeting Eager's user interaction design goals. By turning an item green before it is selected, anticipation is unobtrusive, it represents actions in the modality employed by the user, and it presents generalizations concretely.

A user study showed that first-time users were able to use Eager without instruction.

Acknowledgments

I would like to thank Shifteh Karimi for the user study, Ruben Kleiman for the MacFrames frame system and ipc, Donn Denman for his assembly language programming, Yin Yin Wong for the Eager icon, Bryan Griffin for the port to C++, and Steve Weyer, Harvey Lehtman, David Nagel, and Larry Tesler for their support of my work.

Eager

Uses and Users

Application domain: Application-independent; currently applied to HyperCard

Tasks within the domain: Repetitive tasks

Intended users: Non-programmers

User Interaction

How does the user create, execute and modify programs?
Programs are created by simply performing a repetitive task.
Programs are modified by performing an action other than the one highlighted by Eager.

Feedback about capabilities and inferences:
Eager uses anticipation highlighting to show the user how it has generalized.

Inference

Inferencing:
Eager constantly compares each new event with the event history. It matches commands if they have the same verb and if each of their arguments follows a pattern . Typical patterns include the days of the week, and a linear sequence of integers.

Types of examples: Eager bases its generalizations on multiple examples

Program constructs: Variables. Iteration: set iteration, iteration with a counter, and iteration until a condition is satisfied. For instance, if a loop creates buttons in a column, the loop will terminate when the bottom of the card is reached.

No nested loops. No conditionals.

Knowledge

Types and sources of information:
A knowledge base of patterns that can be expanded by developers. For example, linear sequences of integers, and common string sequences like days of the week.
Application knowledge bases of features to test when matching command arguments. For instance, HyperCard's Cards match on Names, Numbers, or IDs.

Implementation

Machine, language, size, date: Macintosh. 6,000 lines of LISP on a microExplorer, ported to Macintosh Common Lisp, then ported to 15,000 lines of C++. The knowledge base describing HyperCard commands is 800 lines long. The modifications to HyperCard to make it recordable and to implement anticipation highlighting are 5,000 lines of Pascal. 1990.


back to ... Table of Contents Watch What I Do