back to ... Table of Contents Watch What I Do

Chapter
22

A System-Wide Macro Facility
Based on Aggregate Events:
A Proposal

David S. Kosbie
and
Brad A. Myers

Introduction

User-defined macros have gained widespread popularity among the users of today's highly-interactive, graphical applications. This popularity attests to the power macros can provide. By encapsulating complex sequences of actions into a script which can then be invoked by a single event (such as a keystroke), macros can

* Save time: Actions can be read far faster from a script than they can be entered by a user;

* Decrease errors: Scripts are always read correctly, whereas users often err when entering repetitious commands; and

* Operate autonomously: Long-running scripts can be executed without a human operator, freeing the user to perform other tasks.

Moreover, macros are actually programs which have been created demonstrationally by end users. As such, macros represent the first large-scale commercial success for PBD technology.

This success, however, is largely confined to application-specific macro facilities -- that is, macro facilities distributed with and tightly linked to specific software packages. Despite this fact, a strong argument can be made in favor of a single set of system-wide (hence application-independent) macro facilities. Among other advantages, this approach allows for macros which include actions from multiple applications and also provides a single mechanism both for developers to support and for end users to learn. The allure of system-wide macro facilities is further demonstrated by the growing number of such products on the market. Unfortunately, even the most popular of these systems, such as Tempo II Plus [Affinity 91] and QuicKeys [CESoftware 90], suffer from serious shortcomings: foremost, they produce macros which are less robust and less intuitive when compared to their application-specific counterparts. These problems stem from the loss of semantic detail resulting from actions being recorded outside of the application which performed them. How is a macro system to infer, for instance, that the user just dragged a file icon into the trash, if the macro system only records a series of mouse moves?

This difficult problem has led to the development of high-level events (as opposed to low-level events such as mouse clicks and keystrokes). Using high-level events, for example, once a desktop application determines that a file was dragged into the trash, instead of actually deleting the file, the application would send itself a "Delete-File" high-level event. This event is passed to the system, where it can be recorded, and then sent back to the application, where it is finally executed. This is the basic approach used in Apple Events [Apple 91] and HP NewWave [Fuller 89], and to a lesser extent in Microsoft OLE [Petzold 91]. These architectures represent a major commitment to the high-level event paradigm, and they are meeting growing success as developers and end users begin to appreciate the power and elegance of this approach.

This chapter proposes, however, that existing high-level event architectures do not go far enough. The basic problem is that -- like the low-level macro facilities they subsume -- they record a flat history of events. While these events are at a higher level than mouse and keyboard actions, they are still grouped in one level. This is problematic because humans typically perform tasks with rich, multi-level structures. For instance, Figure 1 depicts a very high-level "Edit-File" event, which includes a "Save-File" event, which itself includes the sub-task of specifying the new file name, which then includes the sub-task of editing a string in a dialog box, which finally includes the low-level sub-tasks of entering actual keystrokes. If the system's history includes only one level of events, then all the history-based support -- including macro recording and playback, undo, event-based invocation (described in Chapter 21), and so on -- is confined to that level. Moreover, it is not sufficient to merely include all these events in the history (which present systems can do) -- the system must also maintain the basic structure of the events (which present systems can not do). Otherwise, the system will erroneously apply the same operation (such as undo or playback) to multiple levels of the same event.

Figure 1. Users perform low-level events, such as typing the word "foo", within the context of richly-structured higher-level tasks, such as naming a file during an edit session.


Following this reasoning, this chapter proposes Aggregate Events, a new model of application event processing which extends the current high-level event paradigm to include arbitrary-level, structured histories. The Aggregate Event model serves as the basis for KATIE [Kosbie 93], a new architecture for improved system-wide, event-based macro facilities. Under this model, applications will group low-level input events (such as mouse clicks) into higher-level interface events (such as menu selections), which then are grouped into higher-level application events (such as "Delete-Paragraph"), which may be grouped into even higher-level events, such as "Edit-File." When an application groups events into a higher-level Aggregate Event, that event is sent to the system dispatcher by the same means that current high-level architectures utilize. However, the additional aggregation information will allow the system to maintain properly structured history. With Aggregate Events, system-wide macro facilities should match or even surpass the power of application-specific approaches, and serve as a viable platform for complex, demonstrational, end-user programming.

Advantages of a Structured, Multi-level History

The many advantages of this approach stem primarily from the improved matching of the event history to the user's intentions. Indeed, existing high-level models leverage this very point -- by representing events such as "Select-File", as opposed to "Mouse-Down", they produce scripts which are:

* More robust: The higher semantic content in the recorded scripts improves playback under new circumstances (for example, a macro may still work correctly when the objects it operates over are moved around on the screen);

* More intuitive: Scripts with events such as "Select-File" are more understandable, readable, and editable than scripts with mouse and keyboard actions; and

* More efficient: Macros using high-level events often play back faster as they need not go through intermediate states (for example, an application can process a "Drag-Object" event by directly moving the object to the new location -- it need not actually drag the object through each intermediate point).

By the same logic, however, extending the model to even higher-level events should result in even more robust, intuitive, and efficient scripts. Furthermore, retaining the aggregation information allows the history's structure to more closely resemble the user's task structure. This, in turn, enables the user to apply any available history-based support to any level of her task. Additional benefits of this approach include:

* Multi-level Undo: When the user selects "Undo" after performing some complex task, should the system undo the entire task, or just the final step of the task? For instance, say the user selects some text and then uses a dialog box to set the text's font. If she then selects "Undo", should the system reset the text's font to the previous font (by undoing the high-level "Set-Font" event), or re-open the dialog box in its final state (by undoing the mid-level "Close-Font-Dialog" event)? There is no absolute answer to this question -- the user may desire either behavior. The structured, multi-level history of Aggregate Events extends current techniques to allow the system to support both options in a general, system-wide manner.

* Improved Event-based Invocation: In Chapter 21, we propose that PBD systems should allow users to demonstrate when their programs should be invoked. There exists a strong synergy between that proposal and the ideas presented here -- by exposing more levels of the user's task structure, the system gives the user more control over what sorts of events should invoke user-created programs.

* Improved Script Matching: For complex macros, it is desirable to allow end users to record multiple scripts, demonstrating how the macro should work under different circumstances. An important and difficult step for the inference engine is to match equivalent states in two (or more) scripts of the same macro. The Aggregate Event model improves script matching by addressing two common problems: multiple invocation methods and multiple event orderings. It is worthy to mention that other techniques, such as anticipation feedback (described in Chapter 9), can minimize the instances under which these problems occur, and would certainly complement the techniques described here.

* Multiple Invocation Methods Problem: Applications typically support various independent methods of invoking their most-used functionality. For instance, a word processor might make the selected text bold in response to either selecting "Bold" from the "Format" menu, or to directly typing "meta-B." Matching these two seemingly unrelated actions between multiple scripts is a very difficult task. However, if the event history contains a "Make-Selection-Bold" event both times, the matching problem disappears.

* Event Ordering Problem: Even if the end user is kind enough to use only one method of invocation, all is not solved for the script matcher. Consider an Aggregate Event E which has three component events, A, B, and C. If the order in which the end user performs these actions does not affect the outcome, then she may unwittingly change their order the next time she records the script. It is a considerable challenge for an inference engine to correctly match "...A,B,C..." against "...A,C,B...". However, as both sequences lead to the same aggregate event, the problem is reduced to correctly matching "...E..." against "...E...".

* Improved Error Recovery: Under current methods, if processing a high-level event produces an error, the system can either abandon the operation or apply some error-recovery mechanisms and retry processing the event. The structured, multi-layered Aggregate Event history should aid these recovery mechanisms by more precisely denoting where in the computation the error occurred. Moreover, if and when the error is resolved, the system can utilize the structured history to reset the computation to the previous event at the same level as the error. Together, these effects should produce more powerful and more efficient error-recovery schemes.

* Low-level Support: There are various other instances when applications (or users) require access to the low-level events, sometimes to help analyze higher-level events, and other times to process directly. As the proposed model integrates low-level events into the structured history, it clearly supports these behaviors. Relevant situations include:

* Anticipation Feedback: In Allen Cypher's Eager (Chapter 9), the system utilizes low-level events comprising recorded actions to determine the nature of the anticipation feedback for those actions.

* Gesture Recognition: Systems such as Dean Rubine's Grandma [Rubine 91b], which translate mouse movements into higher-level gesture events, may need to refer back to the low-level events (perhaps for retraining, for example).

* Animation: Consider the case of a geometry teacher who demonstrates the shape of a cycloid by recording her actions as she drags a dot along the cycloid's path. Clearly, when the students replay her program, it is unacceptable for the macro facility to issue the high-level "Move-Object" event (which would cause the dot to immediately jump from the starting to the ending location). What matters is the path of the dot, and this information is retained only in the low-level events.

* Keyboard Mapping: It is common these days for people to work on multiple hardware platforms. These platforms, unfortunately, usually sport incompatible keyboards, so that for instance the "Rubout" key on one keyboard is actually the "Delete" key on another. A common fix is for the end user to map certain keys so that the system converts them to other keys before the keystroke goes through the standard event processing. The difficulty here is that the method of declaring the mapping is often machine-specific, and then possibly application-specific, too. By exposing low-level events, this model allows end users to map keyboards using the same macro invocation techniques applied elsewhere -- for example, the "Rubout" keystroke could invoke a (very simple) macro which issued a "Delete" keystroke instead.

* Program Testing: There are times when a programmer requires that a specific sequence of low-level events be issued on demand. For example, after creating a new widget, a programmer may use macros with low-level events for benchmarking and debugging the code.

The Key Issues for Aggregate Events

There are three main issues facing any implementor of an Aggregate Event architecture:

* Structural Composition;

* Granularity; and

* Required Side Effects.

Of course, there are many other relevant issues -- for instance, how are Aggregate Event arguments encoded, and how are Aggregate Events translated into a human-readable format? These questions, and others like them, are not specific to Aggregate Events, however, but apply to any high-level event architecture. In fact, the existing systems from Apple, Hewlett-Packard, and Microsoft all provide thoughtful solutions to these problems. As such, we will limit our discussion to issues regarding the infusion of multi-layered structure into the event history.

Figure 2. A simple multi-level, structured history resulting from drawing a rectangle in a graphical editor.


Structural Composition

The first consideration when creating Aggregate Events is how to generate the structured links in the history. To illustrate, consider Figure 2, where the user of a graphical editor draws a new rectangle. She does so by entering low-level mouse events, which map into mid-level feedback events, which map into a high-level "Draw-Rectangle" event. These actions produce the flat event stream depicted in Figure 3.
Figure 3. The flat event stream which results from the actions in Figure 2.


As is, this event stream is nearly worthless -- replaying it, for instance, if it did not just crash the editor, would generate one set of mouse movements, two feedback objects, and three new rectangles! Clearly, we must impose structure on this history.

As a first step, we can apply the fact that all events are sent through a common point -- the system event dispatcher -- thus enabling us to easily maintain a call graph such as in Figure 4. This indicates, for example, that the "Mouse-Down" event handler issued the "Draw-Feedback" event. With this information, the system can infer that either, but not both, of these events could be included when replaying the history. Unfortunately, a call graph alone does not provide enough structure. Figure 4, for instance, still requires links indicating that the "Draw-Feedback" and "Resize-Feedback" events are components of the "Draw-Rectangle" event. Inferring these links, in fact, is the focus of our present research.

Figure 4. The call graph imposes some, but not enough, structure on the flat history of Figure 3.


To that end, we are developing a suite of techniques that application developers can use to gather the missing component links. Briefly, one such technique is based on dependency analysis, wherein the application provides sufficient information for the system to infer, for example, that the "Resize-Feedback" event subsumes the preceding "Draw-Feedback" event. Adding these new links to the call graph results in a complete structure, such as in Figure 5.

Figure 5. Adding dependency links (the horizontal arrows in this diagram) to the call graph of Figure 4 provides enough structure of the event stream so that history-based mechanisms (such as Undo, and macro facilities) can determine which events should be suppressed when others are issued.


Another mechanism under consideration is an Aggregate Event shell. By this method, when an application begins processing component events of a future Aggregate Event, it creates a shell for that event. As the component events are processed, they are linked to the shell. For example, when the graphical editor processes "Resize-Feedback" events, it is aware that these events will eventually be part of a higher-level event, but it does not know which higher-level event, so it creates a generic shell which temporarily claims the feedback events. When the expected event (in this case the "Draw-Rectangle" event) finally occurs, the editor associates the event with the shell. These along with other techniques we are developing should simplify, and sometimes even eliminate, the structural composition problem.

Granularity

The next question regards the granularity of Aggregate Events -- that is, which application function calls are appropriate candidates for Aggregate Events? Including too few will not take full advantage of the architecture, whereas including too many will overwhelm the system's time and space resources. To a first degree, we propose a solution suggested in the MIKE system [Olsen 88] and also embraced by Apple Events: high-level events should correspond to those actions which users may wish to undo. The distinction, however, is that the application developer should consider this in the context of multi-level undo. In the future, we hope to provide more concrete guidelines for interpreting this metric.

Required Side Effects

Perhaps the most insidious problem facing Aggregate Events is required side effects. Returning to the example in Figure 2, consider the mid-level feedback events. Collectively, these have the side-effects of drawing, resizing, and erasing a feedback object. Moreover, when replaying this history, if the system issues the high-level "Draw-Rectangle" event, the mid-level side effects will not occur. This is exactly as it should be. However, let us now presume the following of the graphical editor:

* As part of its function, the "Mouse-Down" event handler sets a global variable, "Mouse-Down-Posn", to the position of the click; and

* The "Draw-Rectangle" event handler accepts one vertex as an argument, but it reads the opposite vertex from the global variable "Mouse-Down-Posn."

Although this strategy seems reasonable, and works just fine in conventional systems, it leads to pathological behavior in a high-level event architecture. If the "Draw-Rectangle" event is issued apart from its component events, as would likely occur during playback of this history, the "Mouse-Down" event never occurs, and so its required side effect -- the setting of "Mouse-Down-Posn" -- also never occurs. Thus, "Draw-Rectangle" will use the previous value of this variable! Of course, the results of this action are poorly-defined and unacceptable in any case.

The simplest strategy for dealing with this situation is to forbid it from occurring (as a matter of policy, that is). This is the strategy employed by all existing high-level event architectures. In Apple Events, for example, application developers are required to factor their applications into the user interface component and the code that responds to user actions. This entails separating all code for the two parts, and then communicating between them strictly via Apple Events. This idea is expressed in NewWave as separating code into a low-level "action processor" and a higher-level "command processor."

Although this strategy is effective, it also places a heavy burden on application developers. We are presently pursuing two means of reducing that burden. First, we are developing an Aggregate-Event-friendly widget set -- applications based on these widgets will be factored and will correctly process Aggregate Events, at least to one level above the widgets. Second, we are developing techniques to detect required side effects, and then to provide for their (semantically correct) execution during macro playback. In the near-term, however, we will require application developers to do their own factoring above the widget level.

Other Issues

There are many other interesting issues to consider. For example, what sort of user interface (and programmer interface, for that matter) should be placed atop the proposed architecture? Also, what are the precise semantics of the links in the structured history? And in what additional ways can this approach benefit PBD technology? We are actively pursuing these and other issues.

State of the Work

Aggregate Events serve as the basis for KATIE [Kosbie 93], a new architecture for improved system-wide, event-based macro facilities. KATIE is being developed by the first author as part of his doctoral dissertation at Carnegie Mellon University. As such, the model is in a "fluid" state presently. As stated, the present focus is on improving techniques for Aggregate Event structural composition. However, we are also making progress along other dimensions, and hope to be able to distribute a KATIE prototype in the near future.

Conclusions

This chapter presents the advantages of system-wide macro facilities and reviews existing approaches. We then propose a new and more powerful model based on Aggregate Events. This model extends high-level event architectures (such as Apple Events and HP NewWave) by infusing multi-level structure into the event history. By more closely matching the event history to the user's task structure, the Aggregate Event model produces scripts which are more robust, more intuitive, and more efficient than present methods. The new model also allows the user to apply any available history-based support (such as Undo, or macro recording and playback) to any level of her task. Additional benefits range from improved script matching and error recovery to low-level support for anticipation feedback and gesture recognition. Although some important design issues still remain, preliminary results have been quite promising. With Aggregate Events, system-wide macro facilities should match or even surpass the power of application-specific approaches, and serve as a viable platform for complex, demonstrational, end-user programming.

Acknowledgments

This research was funded partially by NSF grant number IRI-9020089 and partially by the Avionics Lab, Wright Research and Development Center, Aeronautical Systems Division (AFSC), U.S. Air Force, Wright-Patterson AFB, OH, 45433-6543 under Contract F33615-90-C-1465, Arpa Order No. 7597.

The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government.



back to ... Table of Contents Watch What I Do