back to ...

Chapter
11 The Turvy Experience:
Simulating an Instructible
Interface

David Maulsby

Introduction

An instructible computer acquires data descriptions and procedures by being taught rather than programmed. Teaching requires less planning, analysis or understanding of the computer's internal representation, but it raises problems of how to correlate, generalize and disambiguate instructions. Human pupils interact with their teacher to solve these problems; a computer, with its more limited (and rather different) abilities, will need help specific to its "cognitive style". An instructible user interface involves more (and more flexible) interaction than a programmable or purely demonstrational one, and will fail if it confuses or frustrates the user. To explore the possibilities and pitfalls, we had end users help us design and test interaction methods for an instructible agent, called Turvy. Turvy behaves in some ways like a human apprentice, yet has primitive background knowledge and limited language understanding. We wanted to refine and test ideas before implementing them, so we used the Wizard of Oz technique; Turvy was a researcher hidden behind a screen. We (that is, our users) rapidly found simple, effective ways of teaching through demonstration and verbal hints, while Turvy found ways to elicit them.

Figure 1 shows a sample task and dialog between Turvy and one of our users.

Figure 1.
Task 3: Make a heading with author's name and date. This task is one of the most difficult, since it involves parsing lists of people's names, which may include initials, baronial prefixes like "van", and special annotations like "(ed.)". Turvy must learn to find the primary author's surname -- the word before the first comma or colon. It may include a lower case word (like "van"). The date is the last two digits before the period at the end of the paragraph. In some cases the final period is missing; Turvy must ask about this exception. In two of the entries, the author's name and initials are reversed ("Michalski R. S."): Turvy therefore picks the last initial. When the user corrects it, Turvy asks why this case is treated differently, and the user must teach it to skip back over single-letter words ending in a period.

Turvy interprets hints about syntax (before a colon) but not about subject matter (name, publications, date). Turvy matches hints with actions, taking "last two digits" to mean those she typed, not those at the end of the paragraph.

c. Kate's transcript

Kate: Take the last name, it will be in some cases before a colon.
[Kate copies "Agre", pastes it to a new paragraph, types brackets and sets the style.]
This is a list of publications. Now we're going to add the last two digits of the date [types "88"].

Turvy interprets Kate's question to mean the example is completed. Turvy has formed an initial generalization and offers to edit the second entry ("Andreae" in Figure 1). Each step that involves relocating the cursor is treated as a pattern search. When Turvy finds an instance of the pattern,
it does a macro, describing actions so Kate can verify or correct them.

Do you want another demo?

Turvy: I can try if you want.

Kate: OK.

Turvy: Looking for word before first colon in next paragraph [picks "Andreae"], OK?

Kate: OK.

Turvy: [while doing actions] Copy, go to start of paragraph, OK?

Kate: OK.

Turvy: New paragraph, open bracket, paste, close bracket, change style to "citation", OK?

Kate: OK.

Now Turvy makes a mistake with the date. When Kate typed "88," Turvy found no instance of it in recent actions, so it assumed the number is a constant. Turvy always requires a demo to go with verbal instructions. After Kate shows the right action, it echoes back a new hypothesis.

Turvy: Type in blank space "88", OK?

Kate: No, the last two digits at the end.

Turvy: Show me please.
[Kate picks digits.]

Two digits before period at end of paragraph?

Kate: Yes.

Turvy: May I continue?

Kate: OK.

This next example introduces a new case, multiple authors (see the third entry in Figure 1). Turvy's search pattern was "word before first colon in the paragraph," which selects the last author. Note that Turvy rejects the generalization "any punctuation" for two reasons: words before periods ("E.", "H.") were not given as examples; and Kate said "comma."

[Turvy picks "Smith"]

Kate: No - stop.

Turvy: Show me please.

Kate: I want the last name of the first author.

Turvy: I don't understand - show me please.

Kate: When you have a number of authors, choose the first [picks and copies "Andreae"]. It's before the comma.

Turvy: Word before first comma or colon?

Kate: Yes.

A little later on, Turvy comes to an entry in which the surname happens to precede the initials ("Michalski" in Figure 1). Turvy picks the last initial. Kate's hint "first word" does not repair the pattern, so Turvy proposes the pattern "word followed by 0 or more words ending period, followed by comma or colon".

[Turvy picks "S."]

Kate: No.

Turvy: Show me please.

Kate: I want you to take in this case the very first word [picks "Michalski"],
then continue as before.

Turvy: You skipped back over words ending in period?

Kate: Yes.

Turvy: Continue?

Kate: OK.

Finally, Kate teaches Turvy to parse names with baronial prefixes ("van Lehn" in Figure 1). Turvy recognizes "case, select, last two words" in Kate's hint. "Case" suggests a new special case, or the case of the word. The latter explains "van Lehn," but the Wizard plays dumb. Turvy asks for some feature that completes the pattern: (0 or 1 of ??) followed by word before (0 or more words ending in period) before first comma or colon.

[Turvy picks "Lehn"]

Kate: OK, um, stop [hesitant]. In this case I want you to select the last two words.

Turvy: Show me please.

[Kate picks "van Lehn"]

What's different about this case?

Kate: Hmm [confused].

Turvy: You selected these two words, why?

Kate: Because these two words make up his last name.

Turvy: Is there some feature of that word [highlights "van"] I should look for?
That word exactly?

Kate: [hesitant] In this case, yes.

In the dialog presented above, Kate's goal is to get Turvy to do the task (until a new case requires further teaching), and Turvy's goal is to get more specific information from Kate about what patterns to look for. Typically, Kate does one example, then Turvy does the next. When Turvy errs, Kate demonstrates the correction and maybe gives a hint; Turvy echoes back its interpretation. If Kate's hint is ambiguous, Turvy proposes a guess to elicit further explanation. This sample session also illustrates the role of the Wizard, who has to interpret Kate's actions and hints according to pre-defined rules, while using his discretion to make Turvy a little extra stupid if that helps gather more experimental data.

An Interface Idea

Turvy extends the notion of programming by demonstration. It watches demonstrations, but also invites the user to point at relevant objects and give verbal hints. Turvy also adopts a more rigorous and general purpose learning method than previous systems, using domain knowledge to learn from one example, but finding similarities and differences over multiple examples and matching the user's hints with observed features to zero in on a generalization. It can learn concepts that include disjunctions and negations. Although we did not work out all the details of the learning system before testing Turvy, most of its components had already appeared in the machine learning literature, so we were confident that it could be implemented in the near future.

Turvy's domain knowledge and language understanding are quite primitive. We tested our simulated Turvy on bibliography editing tasks, but Turvy knows nothing about bibliographies. Users have to teach concepts like "surname" as syntactic patterns like "word before colon before italic text." Although a real instructible agent might have built-in knowledge about bibliographies, as in Tourmaline (Chapter 14), users in real situations will be teaching arbitrary new concepts, so we chose to test Turvy's ability to learn from scratch. Turvy's background knowledge, a collection of pattern matchers and generalization hierarchies for syntactic features, appears feasible to implement.

Turvy spots keywords and phrases in users' continuous speech. If the user says, "I want you to find surnames that come before a colon," Turvy spots only "find, before, colon." Turvy's knowledge of English can be implemented as a phrase thesaurus. Spotting phrases in continuous speech is problematic, although Apple Computer's "Casper" project shows promise in this direction.

An Experiment

Experience with a previous system, Metamouse (Chapter 7) helped shape the ideas behind Turvy. But it seemed infeasible to extend Metamouse to try out these new ideas, because the system was too big and full of bugs. Moreover, we did not have a complete design for the learning system, nor could we implement speech recognition. These are reasons enough for testing a simulated Turvy. But we had one more reason: we didn't want to predetermine the interaction, we just wanted to put some realistic limits on it.

Testing a simulated rather than a working prototype affords several advantages. First, the simulated prototype is very cheap to implement and easy to change. Second, various implementation details may be left out, with a canned script replacing computation. Third, a reliable, interesting commercial application can serve as the task environment. Fourth (and perhaps most importantly), the designer, in portraying Turvy, becomes directly engaged with the problems that users face, and is therefore more likely to understand them.

In the experiment, we invited a variety of people to teach Turvy six bibliographic editing tasks, ranging from trivial (replace underlining with italics) to taxing (put all authors' given names after their surnames). Users were allowed to invent their own teaching methods and spoken commands, but Turvy had pre-defined limits on inferences it could make and vocabulary it would recognize. We wanted see how people adapt to an agent that seems to learn like a human being yet has surprising limitations. Also, we wanted to find any regularity in the kinds of instructions people tend to give.

We found that users readily learned to speak Turvy's language and did not object to using it. Turvy's speech output was the most important factor in training them. All users employed much the same commands and used similar wording. Users formed two camps: talkative ones, who focused on interacting with Turvy, and quiet ones, who concentrated on their work. All users came to understand that Turvy learns incrementally, refining a concept as new cases arise. Thus they were content to give one or two demonstrations and let Turvy take over. Few subjects used pointing gestures to focus attention: when Turvy asked them to point at features they became confused. When it asked them to describe features they often did. But the best elicitation technique was to propose a guess: users almost always replied with the correct feature, even though they had no menu from which to choose and Turvy had not mentioned it.

Inference Model

The first stage in preparing for the experiment was to define Turvy's learning mechanism in as much detail as possible. The reason for doing this was to make its behavior consistent and interesting (i.e. plausible). Since the Wizard of Oz could not simulate all of Turvy's inferences on the fly, we used the model to analyze tasks and then script responses to examples (given in a standard order), and also to certain verbal hints.

When defining the model, we assumed that the sort of agent we could implement will have about as much background knowledge and only slightly better inference mechanisms than existing demonstrational interfaces like Peridot, Metamouse, TELS and Eager (see Chapters 6 through 9). The learning mechanisms we envisaged use knowledge-guided generalization to form an initial theory from a first example, followed by similarity-based generalization to refine it over multiple examples (for definitions of these machine learning terms, see the Glossary at the end of this book). Verbal and pointing hints would be used to focus the system's attention on features and objects, causing it to propose them in preference to others when forming generalizations, and possibly causing it to "shift bias" -- that is, to consider a different set of features.

What Turvy learns

Turvy learns "Find&Do" procedures (cf. Triggers, Chapter 17) that consist of:

* syntactic patterns, including constants and variables

* macros, sequences of application events

* conditional subtasks, rules that test search patterns and execute macros

* sequences of one or more conditional subtasks

* loops, repeating sequences

Turvy has two kinds of built-in background knowledge: an algorithm for constructing procedures; and application-specific knowledge about generalizing search and replace patterns. Turvy learns by matching observed features of examples with descriptions already in either its background knowledge or the model of the task it is learning. When generalizing, it matches multiple examples to background knowledge patterns that cover them. Hints help it choose among alternative patterns or try a pattern not directly suggested by examples. We worked through each of the six tasks, manually applying Turvy's knowledge to build a model. The model of Task 3 (taught by Kate above) is given below.

task CITATION-HEADINGS:
when user says "Make citation headings" run T3-LOOP

loop T3-LOOP: repeat (EDIT-SURNAME EDIT-DATE)

conditional EDIT-SURNAME:
if find AUTHOR-SURNAME then do MAKE-HEADING
else TurvyAllDone

conditional EDIT-DATE:
if find PUBLICATION-DATE then do HEADING-DATE
else TurvyAskUserForDemo

pattern AUTHOR-SURNAME:
Paragraph.Start SomeText (BARONIAL-PREFIX := 0 or more Word)
(SURNAME := Word)
(0 or more Word Period) /* as in "Michalski R. S." */
(Colon or Comma) /* end of the pattern */
where
Style(Paragraph) = "bibliography" and
Capitalization(BARONIAL-PREFIX) = AllLowerCase

pattern PUBLICATION-DATE:
(DATE := Digit Digit) (0 or 1 Period) Paragraph.End

macro MAKE-HEADING:
copy SURNAME demo select SURNAME, copy
locate CurrentParagraph.Start demo put-cursor-at CurrentParagraph.Start
insert "[" demo type "["
insert SURNAME demo paste
insert Blank (DATE-LOC := "]") demo type Blank "]"
insert Return demo type Return
set property ParagraphStyle of CurrentParagraph to "citation"
demo select styleMenu, select "citation"

macro HEADING-DATE:
copy DATE demo select DATE, copy
locate DATE-LOC.Start demo put-cursor-at DateLoc.Start
insert DATE demo paste

Procedures

A grammar for procedures learned by Turvy is given in the Appendix to this chapter. A task is a sequence of high-level actions to be done in a given context. Each action is a subtask whose context is the task sequence plus some explicit test. The two types of subtask are conditionals and loops. A conditional tests for patterns in the data and executes a macro (a sequence of application actions) on the pattern found. A loop is a sequence of subtasks repeated until one of them executes a termination macro or reaches a given count. In the CITATION-HEADINGS task above, the top level subtask is a loop, T3-LOOP, which repeats a sequence of two conditionals, EDIT-SURNAME and EDIT-DATE. The TurvyAllDone macro under EDIT-SURNAME terminates the loop.

Contexts are patterns that trigger invocations (see Chapter 21); they comprise one or more system events (e.g. the clock reaches a certain time), application events (e.g. text is pasted), user commands (e.g. "Make citation headings"), or the selection or alteration of some pattern in the data (e.g. the user selects a telephone number). When learning a new task, the default context is empty, which means the procedure will not be invoked again. The context for the CITATION-HEADINGS task is the user-defined command "Make citation headings".

In the experiment we did not investigate the teaching of invocations. Turvy's heuristic for inferring them is to trigger a task on the event that provides the data to which the task refers. For instance, in Potter's "word paste" scenario (see the Test Suite appended to this book), the paste operation provides a string around which the task operates, so paste is inferred as the invocation. The user could replace this or specialize it (add a condition) by indicating other precondition events or by defining a verbal command.

Conditionals are sets of rules, like "if-then-else" or Lisp cond, that map search patterns to macros. They allow Turvy to handle different special cases in the data at each step of a task. When executing a conditional, Turvy matches each alternative search pattern with features in the data (e.g. "word before colon"). The first one to match fires (as in cond). If none fires, Turvy does some default action: ask the user for a demo, go on to the next subtask, or announce completion. A new conditional has the default TurvyAskUserForDemo (cf. Pygmalion, Chapter 1). When a task or loop ends, the last executed conditional's default is set to TurvyAllDone (as in EDIT-SURNAME above).

Turvy learns a new conditional subtask whenever the user selects data that must be found by searching, as opposed to data that is directly derivable. For instance, when the user selects the two digits at the end of an entry, Turvy cannot extend the search pattern AUTHOR-SURNAME to include them, so it creates a new conditional. On the other hand, when the user goes to paste them before the bracket in the heading, Turvy does not need a new search pattern to derive the insertion point, so that step is appended to the HEADING-DATE macro.

Turvy continually tries to predict, and learns new rules or modifications to existing rules whenever prediction fails. Turvy matches the user's demo with subtasks of the current procedure. If Turvy predicted the right macro but picked the wrong object (for instance, when it chose "Smith" instead of "Andreae"), it revises the search pattern. If Turvy fired the wrong rule, it specializes that rule's search pattern, or else creates a new special case rule to be tested at higher priority. If a rule does not fire when it should, Turvy generalizes its pattern. If no rule matches the current example, Turvy creates a new rule.

Loops are sequences of one or more subtasks, which may themselves be conditionals or loops. Turvy's loops emulate set iteration, repeating for all instances of a search pattern (strictly speaking, the search pattern in the loop's first subtask). The grammar for loops allows for counted iteration and for termination on some special event, but methods by which users might teach these controls were not tested in the experiment.

Turvy continually tests for repeated actions (cf. Eager and Metamouse). Two subtasks are considered the same only if both their search patterns and macros can be matched; macros are checked first, since they provide evidence for matching search patterns, which may need to be generalized.

A repeated sequence of actions implies either a loop or a subroutine. By default, Turvy predicts a loop (justified because Find&Do actions are large-grained subtasks). If it does not continue as predicted, Turvy forms a branch. (In the experiment, we did not explore the alternative of unfolding the loop.) If the user tells Turvy to stop repeating before it has processed all instances, Turvy asks whether the it should end on a count or special condition.

Macros are straight-line sequences of application events containing no loops or conditionals. Macros are parameterized, in the sense that they operate on the current instances of search patterns. Macros may also set variables, provided their values can be computed directly, without having to do a pattern match.

Each macro step is one of the basic editing operators supported by all applications (locate, copy, insert, delete or set property value), but Turvy also records the actual steps demonstrated by the user, since users expect it to imitate them. For instance, in the macro MAKE-HEADING above, the step "insert SURNAME" is paired with "(demo paste)." The reason for using more abstract operators like "insert" is to keep track of variable dependencies, which enables Turvy to match demonstrational sequences more rigorously: after all, any two "paste" actions are the same, but "insert SURNAME" and "insert DATE" are not.

Dealing with user errors. Users can tell Turvy to ignore steps they just did or are about to do. In the former case, Turvy asks the user to undo those steps. When the user undoes actions, Turvy forgets them and the data they referred to.

Search and replace patterns for text editing

Turvy's pattern matching knowledge is domain specific, but domains may be quite general, such as text editing, object-oriented graphics, and numbers. In the experiment, Turvy used a general model of syntax suitable for semi-structured text: it had no built-in knowledge of bibliographies or personal names. Turvy learns search patterns that deterministically predict items the user wants to select. It also learns patterns to describe the results of each macro. Potentially Turvy could learn rewrite rules mapping search to replace patterns, but we did not explore this in the study. Replace patterns provide Turvy with evidence that subtaks with slightly different macros should be matched.

An augmented-BNF attribute grammar for text patterns is given in the Appendix to this chapter. Patterns consist of a sequence of one or more chunks of text bounded by defined delimiters and possibly having specific contents or properties, which are specified in a list of constraints (following the keyword "where" as in AUTHOR-SURNAME above).

Delimiters are punctuation marks, bracketing characters such as quotes and parentheses, word, line and paragraph boundaries, and changes in text formatting (capitalization, font, point-size, text style, subscripting, paragraph style). All tokens, formatting and delimiters within user selected text are parsed. If several delimiters beyond it run together (e.g. Colon Italics), Turvy records them all.

The properties of a text chunk are its string constant, length (which can be measured at different granularities: character, word, paragraph), and formatting. When learning a search pattern from the first example, Turvy ignores properties but records them in case needed later to specialize the pattern.

Variables point to chunks of text within patterns, so that Turvy may re-select or search relative to them. Patterns may set or test variables. Any sequence of elements within a pattern may be assigned to a variable and assignments may be embedded, for instance (BARONIAL-PREFIX := 0 or more Word) and (SURNAME := Word) in the pattern AUTHOR-SURNAME. Variables may appear in macros (e.g. SURNAME, which points to the text actually selected and copied) or property tests (e.g. Capitalization(BARONIAL-PREFIX) = AllLowerCase).

Learning from one example. When learning a new pattern, Turvy parses the user-selected string and its neighborhood, looking at delimiters and properties. If the pattern is deterministic (i.e. a search for that pattern would have selected the same text), analysis stops there; otherwise, parsing continues outward on either side of the selection. For instance, in the first example of Figure 1, the user selects "Agre": its pattern is Word, but that is nondeterministic since two other Words precede it. Turvy therefore scans outward and finds two delimiters -- Colon and format change to Italics -- so the search pattern is (Word Colon Italics). Turvy gives up after testing three runs of text between delimiters. If it cannot construct a deterministic pattern, it asks the user to describe or point to distinguishing features of the current or previous examples. Turvy is biased against forming patterns with string constants (e.g. "Agre"), resorting to them only if multiple examples or a user hint warrant.

Multiple examples. Like Tinker (see Chapter 2) Turvy learns incrementally, generalizing or specializing a pattern when it does not perfectly match the current example. If the search skips over a desired item, Turvy generalizes the pattern by dropping properties or outermost delimiters at either end of it. When the search selects something the user wants to skip over, Turvy specializes the pattern by adding properties or chunks of text and delimiters (as when constructing an initial pattern). Generalization and specialization succeed provided the resulting pattern covers the same set of examples; otherwise Turvy creates a new pattern for the current example (in machine learning terms, a new disjunct), using the method described above for learning from one example. Turvy always verifies a generalization, specialization or new disjunct by asking the user.

Focusing. The user can give verbal hints or point at items to be tested in patterns. If the user selects an object, saying "look here," Turvy tries to include it in the search, if necessary by extending the pattern across intervening text. If the user mentions an attribute, like "italics", Turvy adds it to the pattern as a delimiter or property. Anything the user suggests or confirms is given special status: Turvy will not discard it when generalizing, unless the user tells it to.

Turvy does not do full natural language understanding; instead, it spots keywords and matches them with examples actually seen. The user can state negations ("the text is not before a comma"): Turvy associates "not" with the next descriptive term and checks that it is not true of the example. A hint might refer to the past ("this case is different because the ones that came before, came before italics"). Turvy does not parse the sentence to associate features correctly; instead, it notes that "before italics" applies to previous examples but not this one (and would do so even if the user said simply "before italics").

Interaction Model

One goal of the Turvy study was to discover instructional protocols that would work well with (yet shield users from the details of) Turvy's rules of inference. The approach taken to designing interaction was first to propose a hypothetical instruction set related to the inference model, then refine it based on experience with users. If a user gave all the instructions in the hypothetical set, with fully instantiated arguments, then the system could learn without inferencing. But we expected users to employ a smaller instruction set and leave out or underspecify some arguments, so that learning would depend on both the user's instructions and Turvy's inferences.

Dialog structure

We modeled interaction as a pair of languages uttered by the user and Turvy, with mappings from utterances to responses. Each language is a set of messages, some with arguments referring to data. The model does not completely specify a protocol, since users may order and combine instructions in a variety of ways; we wanted the experiment to show us which instructions and ways of combining them are most natural. We made some predictions about how users would construct lessons, in particular that most would give a single example and then ask Turvy to perform, and that most would not try to anticipate special cases. These predictions were based on experience, not on analysis. We decided not to limit Turvy to dealing only with lessons structured in this way, but rather to let users show us what they preferred.

On the other hand, Turvy tries to guide the teaching dialog by following some rules of discourse. First, it keeps silent while watching the user do an initial example, rather than interrupt the user with feedback regarding its initial pattern hypotheses. Second, Turvy lets the user know that it has matched an action (by saying "I've seen you do that before"), and offers to do the following steps ("Can I try?"). Third, it describes search patterns and asks for confirmation of actions when predicting them for the first time; on subsequent occasions it does them silently. Fourth, Turvy asks for confirmation when it generalizes a pattern to continue a task. Fifth, it asks the user to name features or point at objects when it needs to specialize a description.

Messages

By analyzing the inference model we identified messages by which the user could direct Turvy's learning, and messages by which Turvy could elicit such directives. The basic learning operations to which instructions apply are: (a) start/stop learning and performing; (b) classify the current example by matching it with some concept description; (c) identify relevant features to construct a generalized concept description that discriminates one class from another. We identified 28 potential messages from the user and 30 from Turvy, categorized under these operations. The messages were "generic" (no particular wording assumed). Some of them, like "indicate that there is a relation over the set of examples," seemed too vague and obscure. To cut the predicted message sets down, we retained only those that could be given a short and simple wording.

The messages we predicted are given in Table 1. Most messages from the user have duals for Turvy, so that either party might lead the dialog. There are three broad categories of interaction: "Control" messages determine which party does which actions (learning operation a above); they indicate the structure of a task and therefore guide Turvy to match particular action sequences with one another (operation b). "Focus attention" messages also direct Turvy to try matching, and some indicate which features are relevant (operation c). Focus messages from Turvy to the user are prompts to elicit useful focusing directives in return. "Responses" are just brief replies to focus and control messages.

The proposed instructions assume no particular wording. Thus, Turvy can understand various formulations of a given command, or change the wording of its own utterances to suit the user. For instance, Turvy understands "Do the next one", "Do another", and "Try one" to mean "Do the next iteration." In the event we found that users adopted much the same wording for commands, and would imitate Turvy's wording (e.g. "Do the next one").

Table 1. Messages used in the study.

Legend:
* - used or understood without difficulty
1 - given by only 1 user
R - given only in response to a prompt from Turvy
~ - not differentiated in usage
? - questions asked by Turvy that caused user confusion
blank - not used.

Control messages from User    ... from Turvy                
* Watch and learn   * Show me what to   
what I do                     do                            
* End of lesson     * All done          
 Do the next step             * May I take over?  
* Complete this     * May I continue?   
example (iteration)                                         
* Do the next       * [May I] do the    
example (iteration)           next one?                     
* Do all            * [May I] do the    
remaining examples            rest?                         
(iterations)                                                
* Stop (you've                                    
made a mistake)                                             
~ Undo [one step or to        ~ Undo [last step?] [this     
start of iteration]           iteration?]                   
* Ignore my                                       
actions (I'm fixing                                         
something)                                                  
1 Let me take over             You take over                
1 Always let me do these       Do you want to do this       
steps                         manually?                     
 Focus instructions                                         
1 I'm repeating what I did    * I've seen you     
before                        do this before                
 This is similar to           * Treat this like   
[indicates previous example]  [describes similar item]?     
 This case is different       ? What is different about     
                              this case?                    
R Look here (this is          ? Is this [describes item]    
important)                    important?                    
* Look for          * I should look     
[describes item]              for [describes item]          
R I did this [conditional     * Is this           
branch] because [points at    [describes feature] what      
something and/or lists        distinguishes the two cases   
features]                     [or new special case]?        
1 I'm changing the way I do    You've changed the method,   
this task                     why?                          
 Responses                                                  
R OK / yes                    * OK / yes          
R No                          * No                
R I don't know / I don't      * I don't know,     
want to discuss that          show me what to do

Experimental Design

Wizard of Oz method

The "Wizard of Oz" is a rapid-prototyping exercise appropriate for designing an intelligent, speech-based interface [Wilson 88]. A human being simulates the system's intelligence and interacts with the user through a real or simulated computer interface. (The name comes from the adage, "Please ignore the man behind the curtain," and in Britain such simulations are often called PNAMBIC systems.) Most Wizard of Oz experiments have been done to test the use of speech input [Gould 81, Hauptmann 89]. One experiment used a human expert to give advice in response to spoken queries [Hill 88].

The Turvy experiment concerns an interface that learns tasks from demonstrations and verbal instructions. It differs from other experiments in the demands put upon the Wizard, who must portray an interface that learns complex tasks but is not supposed to have human-level capabilities or cultural knowledge. The key difficulty is managing a lot of information without appearing too intelligent; Turvy must seem to be focused on low-level syntactic details of the text.

We realized from the outset that it would be all too easy for the Wizard to slip out of character, despite our formal models. In order to sustain the simulation of algorithmic intelligence, we decided to minimize the amount of new information the Wizard has to cope with. Hence we had users do standard tasks on data the Wizard had prepared. Moreover, all tasks were designed to limit the user's options: there were no "inputs" (data was merely cut, copied, pasted) and few points at which the order of steps could be varied. Each task was analyzed beforehand, so that inferences made from examples (given in a standard sequence) could be rehearsed by the Wizard, who need only improvise when a user gave a novel hint, tried some unusual way of teaching, or made a mistake.

To collect more data on how users describe syntactic features, we had Turvy ask the user before offering its own hypotheses. We also made Turvy a little extra stupid (for instance, guessing "van" rather than lowercase in Tasks 3 and 4).

Experimental setup

In our experiment users worked with Microsoft Word running on a Macintosh computer. Nearby (but out of eye contact) sat Turvy, who had a second keyboard and mouse also connected to the Macintosh (see Figure 2). In some cases Turvy and the user had their own identical display screens; in others they viewed a single large display. Either setup allowed them to work on the document simultaneously, yet restricted their physical interaction to pointing on screen with the cursor. We used no special software to mediate sharing of the application; either player could seize control at any time. A facilitator was on hand to help the user learn tasks, point out mistakes, and elicit comments.

Figure 2. The experimental setup.

Users worked through five or six tasks, teaching Turvy after short practice sessions. Subjects were told that Turvy learns by watching, and that it understands some speech, but they were told nothing about the sort of vocabulary or commands it recognizes. The instructions for each task were carefully worded so as not to mention low-level features that users might have to describe for Turvy. This tested its ability to elicit effective demonstrations and verbal hints. We told users up-front that Turvy was not a real system. To reinforce the fantasy, Turvy spoke in clipped sentences with rather stereotyped intonation, and we found that users quickly bought into the illusion. They spoke more curtly to Turvy than to the facilitator, and referred to Turvy and the Wizard as two separate entities.

Prototyping process

We began by formalizing Turvy's inference and interaction models, to establish assumptions about what might feasibly be implemented in a real system. Our working hypothesis was that users would learn the interaction model, but we were prepared to alter it to better fit their needs. We then made up tasks that Turvy could learn, in some cases only if the user gives explicit instructions. We chose bibliography formatting for its syntactic complexity and semantic richness. To explore interaction problems, we put "traps" in the data, points at which Turvy would err and embark on a detailed discussion with the user.

We refined the experiment and Turvy's interaction model through iterative testing. Originally we did not plan for a speech interface: we ran a pre-pilot with the proposed facilitator acting as subject and communicating with Turvy via menus and keyboard. When it became clear that our user misunderstood the menu commands, we realized we ought not to predetermine the user's language. Also, the tasks were too complicated, and so we broke them down further. We then ran a pilot in which four people of varied professional background (management, psychology and hypermedia design) tested a speech version of Turvy. After some initial hesitation, they easily composed their own commands. Turvy would point at Word's menu items to indicate features it thought important (for instance, to propose "italics" it would pull down the format menu and select "* Italic"), and to elicit features it would ask very general questions like "What's important here?". These tactics confused the users; they did better when Turvy described its actions, made context specific queries, and proposed guesses.

We again refined the interaction model and then ran the main experiment on eight subjects (three secretaries, four computer scientists, and one psychology student). Turvy's behavior was held constant, but when the wording of a question caused consternation, Turvy would reword it, adding or removing contextual details. Thus the interaction model was further tuned throughout the experiment. We collected interviews, audio, and video from the pilot and experimental sessions. To corroborate our observations and the users' statements, we did a content analysis of the instructions they gave Turvy.

To generalize the results, we did a pilot study on other task domains: file selection and graphical editing, tested on three and two users respectively.

Tasks

Users taught Turvy five or six tasks that progressively modify the bibliography's format. The tasks focus on teaching search patterns rather than procedural constructs; most have a fairly simple structure such as search and replace. The patterns involve concepts like "title," "journal versus book," and "list of authors" -- concepts Turvy does not initially understand. Similar concepts recur in later tasks (and must be taught again from scratch), to see whether users have learned how to demonstrate or describe them for Turvy.

The figures illustrate samples of data for each task; the original data is to the left of the dividing line, the result is to the right.

Figure 3.
Task 1: Put journal titles in italics. Initially, journal titles are underlined, whereas book titles are bold. To help the user, annotations appear below each entry, indicating whether it is a paper, book or thesis. Turvy can learn this task without difficulty from a demonstration with no verbal instructions.

Figure 4.
Task 2: Put quotes around titles of papers. From Turvy's point of view, a title is a string following the first colon and preceding another colon followed by italic text (the journal). When Turvy encounters the title "Inductive inference: theory and methods," it tries generalizing the pattern to end at the second colon. The user must point to or mention the italic text. To add another wrinkle, the journal name for Mitchell's paper has been put in underlined format; Turvy asks whether to treat this like italics.

Note: Task 3 (in which the user puts the first author's surname and last two digits of the date into a heading for each entry) appears in Figure 1 near the start of this chapter.

Figure 5.
Task 4: Put author's initials after their surnames. Turvy does not remember what it was taught in Task 3, and must learn again how to find the various types of surnames; it must also learn to process all occurrences within a paragraph. When Turvy meets the first list of more than one author ("D. Angluin, C. H. Smith") it picks "Smith" only, since that matches the pattern it learned for one author. When the user corrects it, Turvy asks for confirmation of the generalization, saying "look for first comma or colon?". The last author is followed by a colon. "Michalski" causes extra confusion because it appears to have been done already.

Figure 6.
Task 5: Put book titles in Times Roman font. This trivial task was intended to give users a bit of a breather before going on to the final one. Turvy must learn to look for any bold text. Due to lack of time, most users were not tested on this task.

Figure 7.
Task 6: Replace colons between items with periods. This task is a search-and-replace with several special cases. If a colon is next to a period, it is removed rather than replaced, to avoid double periods. Semicolons are treated like colons. Colons or semicolons within titles are left unchanged, as in Task 2. Periods are put at the end of paragraphs if not there already.

Hypotheses

After designing the inference and interaction models, we had a better grasp of the relative complexity of various kinds of instructions and the contextual information needed to interpret them. This helped us form hypotheses about the way people would construct lessons and the instructions they would use.

We had four main working hypotheses, concerning the suitability of the inference and interaction models, the use of a speech-based interface, and the ease of teaching an agent.

1. All users will employ the same small set of commands, namely those given in Table 1, a subset of the interaction model. They will do so even though told nothing in advance about the instructions Turvy might understand.

2. Users will learn "TurvyTalk" (Turvy's low-level terminology for describing search patterns), but only as a result of hearing Turvy describe things. Users will adopt Turvy's wording of instructions. (This hypothesis is based on a theory of verbal convergence between dialog partners [Leiser 89].)

3. Users will point at ranges of text other than those they must select for editing, to focus Turvy's attention on relevant context for the search pattern they are teaching.

4. Users will teach simple tasks with ease and complex ones with reasonable effort. In particular, in complex tasks, users will not try to anticipate special cases but instead teach them as they arise.

Observations and Results

We recorded user - Turvy interaction on videotape, made notes and interviewed users after their sessions. We were more concerned with qualitative than quantitative results: when any one user had any difficulty teaching Turvy, we wanted to know why. To corroborate users' comments, we did quantitative content analyses on transcripts of their speech and gestures during teaching sessions. Although this was an iterative prototyping process, we did put controls on the experiment. All experimental users did the same tasks (1-4 and 6) on the same data, except two subjects who did only 1-4 due to lack of time. Pilot subjects performed those tasks on very similar data. Throughout the experiment, Turvy made the same inferences from given examples, except for several errors on the Wizard's part. Turvy's behavior (in particular, the speech output) was refined during the pilot, but held constant during the experiment.

Dialog styles

We observed two distinct types of interaction with Turvy. One type of user was highly interactive and talkative; the other, non-inviting and quiet. In the main experiment, five users were talkative, three quiet.

The typical talkative user begins by describing the task in words. Turvy says, "Show me what you want." The user performs a single example and asks Turvy to try the next one. Turvy does it, describing each step and asking for confirmation. If it makes no mistakes, Turvy asks permission to do the rest. When it does err, the user cries "Stop" and may then tell it what to do. Turvy says "Show me what you want," and the user does the correction. Turvy asks for features distinguishing this case from the predicted one. Initially it asks vaguely, "What's different about this example?" If the user is puzzled, it says "Can you point to something in the text?" If that doesn't work, it proposes a description.

A quiet user works silently through the first example and goes on to the next one without inviting Turvy to take over, nor even signaling that the first example is complete. When Turvy detects repetition, it interrupts, "I've seen you do this before, can I try?" The user consents, and the rest of the dialog proceeds much as for talkative users, except that a quiet one is more likely to tell Turvy to skip a troublesome case than try explaining it.

In the post-session interviews, we found that both types of user formed similar models of Turvy's inference and interaction. They recognized that Turvy learns mainly by watching demonstrations, but that it understands verbal hints. All users liked Turvy's eager prediction, because they this reveals errors before they become hard to correct. Some talkative users had difficulty at first because they thought they should try to anticipate special cases and give Turvy a complete task description, but they learned, like the quiet ones, that it was better to wait for special cases to arise. All were concerned about completeness, correctness and autonomy; they believed it would be foolhardy to leave Turvy unsupervised.

Command set (Hypothesis 1)

The kinds of instructions users gave was a close-fitting subset of those we predicted (see Table 1; instructions not used have a blank annotation field).

From the table we see that nearly all control instructions were used and caused no difficulty. Most users gave most of these instructions. The actual wordings they used varied little, especially after they heard Turvy ask the corresponding question -- they would drop "May I" and turn the rest of it into a command, such as "Continue" or "Do the rest."

Instructions for focusing attention were problematic. Users almost never volunteered vague hints like "I'm repeating actions," "this is similar," and even "look here." On the other hand, they gave hints in answer to questions from Turvy. The wording of focus instructions was more variable.

TurvyTalk (Hypothesis 2)

We predicted that users would learn to describe user-culture concepts like titles and author's names in terms of low-level, syntactic features. But we also predicted that they would do this only after hearing Turvy use such terms. In the initial pilot sessions Turvy did not verbalize its actions, so users were not exposed to its language. The difficulty they had when Turvy asked them "what's different about this case" prompted us to make Turvy verbalize patterns the first time it searches for them.

In the post-session interviews we asked users about the kinds of terminology Turvy understands. They had some trouble answering so vague a question, but when we gave specific examples, nearly all subjects clearly understood that Turvy looks for and understands descriptions of low-level syntactic features rather than bibliographic concepts. Two subjects thought it might know something about the formatting of names and dates.

Figure 8. Usage of TurvyTalk vs. users' conceptual terminology.

To confirm these observations we did a content analysis of users' speech. We divided the session into 15 events, corresponding to different phases of tasks such as the first example and points where Turvy would habitually err (such as the "Michalski" and "van..." entries in Tasks 3 and 4). We excluded Task 5 for lack of data. We counted the number of user instructions that referred to features in terms Turvy could understand (e.g. "paste after the word before a colon") versus those that involved bibliographic terminology (e.g. "paste after the author's name"). If a single instruction contained both kinds of terms, it was counted under both categories. We did not count the number of words or features referred to in a single instruction, since that might be arbitrarily large and would tend to bias towards the more verbose TurvyTalk; rather, we counted instructions (as defined in the discourse model).

A summary of the average counts for experimental subjects is shown in Figure 8. Although TurvyTalk naturally dominates when describing text formats, it also comes to dominate even where tasks involve concepts like titles, names and dates. The use of user concept terminology tapers off as the session progresses. More interesting is the tendency to combine both forms of speech, as in events 4, 7, etc.: this corroborates our informal observation that users tended to try to relate their concepts to Turvy's by using both languages in the same instruction.

Figure 9. Increasing dominance of TurvyTalk over bibliographic
terminology.

In the graph, the use of TurvyTalk seems to dominate from early in the session (even discounting task 1). We surmised that this was because Turvy's verbal feedback quickly trained users to mirror its language. Figure 9 shows a cumulative average of TurvyTalk minus bibliographic concepts computed for both pilot and experimental subject groups. The running sum is divided by the number of events (so far) in which the user concepts count is non-zero. This measurement has the following nice properties: co-occurrences of user concepts and TurvyTalk cancel each other out; events at which user concepts dominate are highlighted by a sudden drop in the score; and events containing small amounts of TurvyTalk but no user concepts increase the score. A score above zero indicates that TurvyTalk is dominating overall.

The dominance scores indicate that experimental users made more use of TurvyTalk than the pilot users (who received less verbal feedback). It should be stressed that the graph merely corroborates our observations; the small number of subjects and high variability make statistical inferences unwarranted.

Figure 10. TurvyTalk usaged compared for talkative vs. quiet users.

We also compared the dominance of TurvyTalk in talkative versus quiet users. Figure 10 shows the TurvyTalk minus user concept counts for each event. Although TurvyTalk dominates more in talkative subjects overall, note that when problem cases first arise (colon in title, Michalski) the talkative subjects lapse more deeply into their "native" language. Perhaps the most interesting point, however, is that on the second occasion they use more TurvyTalk.

Figure 11. Degree of user confidence throughout a session.

Speech versus pointing (Hypothesis 3)

One instructional technique we hoped to find was pointing to focus attention. This has the advantage of being non-verbal, thus avoiding the problem of how to name something, although it may be less informative. We observed almost no use of pointing (apart from explicit selections in actual steps of a task), perhaps because users are not accustomed to pointing at things other than to select them. In task 2 (put paper titles in quotes) two subjects did point at the fields underneath an entry that identified it as a paper or book. We thought that the principle of convergence might apply to pointing as it does to speech, so that Turvy could teach users to point at contextual items. During the pilot, Turvy would point at contextual items (like the colon after an author's surname) and ask "is this important, should I look for this?" But the users did not understand: in fact, they thought Turvy was making a mistake (preparing to copy the colon rather than the surname), so we abandoned this technique. When Turvy asked users to "point to something in the text," they were hesitant and even confused if the distinguishing feature was not a text string but a property. We concluded that Turvy should propose a feature description instead.

Tasks and teaching difficulty (Hypothesis 4)

One of the most important goals in designing a system for programming by demonstration is to make simple tasks easy to teach and complex tasks teachable. Although we couldn't include any very complex tasks in our study, the range of difficulty from task 1 (changing text formats) to task 4 (parsing names) was considerable. The easy tasks (1, 5) were trivially taught by giving a demonstration. The hard tasks (3, 4) involved much more verbal description, and the level of user effort was higher. The single most difficult problem users had was explaining to Turvy that the name "Michalski R. S." was already in the desired format -- almost (it required a comma after the surname).

In the post-session interviews, all subjects reported that they found Turvy easy to teach, especially once they realized that Turvy learns incrementally and continuously, so that they need not anticipate all special cases.

Figure 12. Degree of confidence for talkative vs. quiet users.

We tried to objectify users' impressions about the ease or difficulty of teaching Turvy by doing a content analysis of certain speech characteristics. We treated occasions on which the user told Turvy to "do the rest" as indicators of high confidence (C in the formula below). Normal responses were treated as positive control (P). Long pauses or words like "umm" indicated hesitation (H). Questions about the meaning of Turvy's actions, verbal stumbling and confusion indicated a loss of confidence (X). We counted such events over each of 20 intervals in the session (once again excluding Task 5). A "degree of user confidence" measure was computed as a weighted sum of the counts, 2C + 1P - 2H - 4X which was normalized by dividing by the total C + P + H + X. Note that we devised this formula to reveal trouble spots in the interaction traces: we chose weights that bias the score towards negative measures, so it tends to emphasize any loss of confidence. Figure 11 shows levels of confidence over each interval of the session, averaged for pilot and experimental users. Events at which users deal with more complex patterns clearly cause the greatest anxiety, but the experimental group seems to suffer less as the session progresses.

Figure 13. Cumulative average confidence for quiet vs. talkative users.

We also compared talkative versus quiet users, as shown in Figure 12. A curious feature of this graph is that the two groups drop into confusion at slightly different points relative to first encounters with tricky cases. The quiet group loses confidence just before the difficulty arises, as if anticipating it. Note also that quiet users tended to tell Turvy to skip a difficult case rather than explain it. Towards the end of the session, both groups are synchronized, with talkative subjects appearing somewhat more confident.

If we compute a progressive average measure for quiet versus talkative users, we get the curves shown in Figure 13. This shows quiet users starting out with an advantage, perhaps due to their strong focus on the task and lack of interaction with Turvy. The talkative users, after recovering from the initial shock of answering Turvy's questions in task 2, slowly increase in confidence. The quiet ones gradually lose their advantage, perhaps because they are not learning as much TurvyTalk (compare with Figure 10).

General observations

Secretaries and computer scientists showed the same ability to use TurvyTalk, but all quiet users were programmers. After their sessions, most subjects commented that they would have used less speech had Turvy been in another room linked by phone. They said they had more confidence in Turvy's ability to understand speech because they knew it was human. On the other hand, some users seemed a bit embarrassed and unsure what to say when they first spoke to Turvy, as if they really believed it was a computer. Also, most users referred to Turvy as an "it" when talking to the facilitator during the sessions.

All experimental users and all but one pilot user said they would use Turvy if it were real. They liked the way it learns incrementally and they did not mind answering a few questions. They believed they would have difficulty coming up with low-level syntactic descriptions entirely on their own. They liked the way Turvy described actions while doing them the first time; they did not notice that it works silently on subsequent iterations.

One pilot user rejected the very idea of an instructible agent and refused to work with Turvy. This person believed that a user would have to anticipate all cases of a pattern in advance, as when writing a program. We were unable to entice this user to go through the Turvy experience.

Conclusions

Further work

Given the consistent success of Turvy in simulation, the next step is to implement it, or parts of it, to reveal problems with assumptions made in the inference model. Since Turvy's speech input is somewhat beyond the ability of current systems, the interface must be more restrictive, guiding the user to give simple keyword hints in a context clearly relating to particular examples. In a prototype called Moctec we have implemented and begun testing part of the system, for specifying search patterns, with an interface having a "mechanistic" as opposed to anthropomorphic metaphor [Maulsby 92a].

Further experiments should be done, focusing on difficulties with procedural constructs (such as whether a repeated sequence represents a loop with internal branching or a subroutine). Turvy has been pilot tested in graphical and database domains, with similar results, but further testing is warranted, especially on the teaching of replacement and constraint patterns.

The version of Turvy tested in the experiment had some limitations designed to elicit more experimental data; for instance, Turvy's inability to re-use concepts in later tasks. The inference and interaction models should be extended to address ways that Turvy can re-use patterns, and ways that users can name or invoke them (see also Chapter 21). Some forms of learning about learning are also suitable for Turvy. In the current design, some features are adopted in preference to others; this preference ordering could be adjusted based on the actual observed frequency of relevance. The chunking of features that appear often together in patterns would also improve Turvy's inferences.

Summary

The Turvy project began with building a model of inference based on machine learning techniques that we judged feasible to implement. By analyzing the key problems of inference, we designed an interaction model, the set of instructions that the user and Turvy would need to give one another to guide inference. We then used the inference model to design and analyze a set of tasks and scripted Turvy's behavior options. We conducted pilot sessions to refine the tasks and Turvy's behavior. We then ran a number of experimental sessions, collecting video traces and conducting interviews. We did content analyses on traces to corroborate accounts given by the users.

We found that people can learn to communicate with an instructible system whose background knowledge concerns only low-level syntactic features. Users learned to speak to Turvy by using their social skills, imitating the way it speaks. All users adopted much the same commands, a subset of those we predicted; moreover, they used similar wordings. These two results suggest that a speech-based interface may be both feasible and preferable. Speech is being tried in another PBD system, Mondrian (see Chapters 16 and 24).

We found that, once users had become familiar with Turvy, the difficulty of teaching corresponded with the difficulty of the task. There were no peculiar features of Turvy that caused users difficulty (in contrast to Metamouse).

The "Wizard of Oz" method proved effective for gathering qualitative data quickly and cheaply. The quantitative results are rather suspect, due to the small population and high variance, but the most important observations, such as large increases in user anxiety when certain special cases arose, were highly consistent. The measurements did agree with opinions given by the users.

Perhaps the most important aspect of the experience was the "training" the designer received in the guise of Turvy. Being personally responsible for a test user's discomfort and confusion motivates thoughtful redesign!

The key risk in using the "Wizard of Oz" method to prototype a system as complex as an instructible interface is that the Wizard will apply more intelligence than he or she is aware of, and thereby obtain inappropriate, optimistic results. We tried to avoid this by analyzing tasks with fairly detailed formal models beforehand. Nonetheless, it remains to be seen whether a fully functional Turvy can be implemented. Handling verbal input and output are likely to be the most difficult problems, and the first implementations will be more primitive than the simulation in this respect. But the ability to learn from multiple examples and focus attention according to user hints should be sufficient to make Turvy an effective instructible interface.

Acknowledgments

Apple Computer Inc. and the Natural Sciences and Engineering Research Council of Canada funded this research. Richard Mander and Shifteh Karimi of Apple Computer helped design the experiment. Richard Mander and Saul Greenberg (at the University of Calgary) acted as facilitators. We owe a debt of gratitude to the people who tested Turvy.

Appendix: Formal Models

Grammars for Turvy procedures and text patterns are given below. Notational conventions are as follows: curly braces "{ }" indicate an optional item, star "*" means multiple occurrences, and items in lowercase or enclosed in single quotes are text constants. Thus, `{' means that a curly brace is a literal constant in the text. Type specifiers are enclosed in dots: for instance, "*Pattern*Name" means the name of a Pattern, as opposed to the name of a Task or Variable, etc.

Procedures

Task ::= task Name: {when Context} run Sequence

Context ::= [ UserCommand | UserAction | SystemEvent |
PatternFound | PatternUpdate ]*

Sequence ::= [ Loop | Conditional ]*

Loop ::= loop Name: repeat ( Sequence )
{ LoopCount times | until Context }

LoopCount ::= Number | UserSpecifiesCount

Conditional ::= conditional Name:
Find&Do {else Find&Do}* else TurvyDefaultAction

Find&Do ::= if find *Pattern*Name then do *Macro*Name

Macro ::= macro Name: Command {; Command}*

Command ::= Operation { (demo AppCommand* ) }

Operation ::= [ ( locate | copy | insert | delete ) *Variable*Name ] |
set property *Property*Name of *Variable*Name to Value

TurvyDefaultAction ::= [TurvyGoTo *Conditional*Name] | TurvyAllDone |
TurvyAskUserForDemo

UserCommand ::= user says UserDefinedCommandWord

UserAction ::= user does ApplicationEvent DataSpecifier

DataSpecifier ::= DataType | DataType in *Pattern*Name |
*Variable*Name

SystemEvent ::= system event *Event*Name

PatternFound ::= pattern *Pattern*Name found

PatternUpdate ::= pattern *Pattern*Name changed
{ to match *Pattern*Name }

Rules of inference for a conditional. When the user rejects a predicted action (including the default action TurvyAllDone), Turvy repairs the conditional that generated it, generalizing or specializing the search pattern and if necessary adding a new branch The rules of inference are given below. Suppose the user has rejected Turvy's execution of alternative X in conditional C.

1. If the user's demo matches X's macro but not its search pattern, Turvy generalizes or specializes the pattern so that it selects the correct object.

2. If the demo matches another rule R, Turvy specializes X's pattern, and if necessary generalizes R's pattern, so that R would fire instead.

If R is in some other conditional K in the current task, Turvy sets C's default to TurvyGoTo K (that is, Turvy should proceed to step K of the task).

3. Otherwise, Turvy adds a new rule N to C, and specializes X.

(Note that Turvy does not check other rules in C to ensure that they do not fire in conflict with R or N; Turvy will wait until such an error arises before attempting to fix it.)

When no rule fires and Turvy asks for a demo, the following rules are applied.

4. If the demo matches some rule R in C, Turvy generalizes R so it would fire.

5. If the demo matches a rule in some other conditional K in the current task, Turvy sets C's default to TurvyGoTo K.

6. Otherwise, Turvy adds a new rule N to the conditional C.

The grammar for syntactic search patterns is given below.

Patterns

TextPattern ::= pattern Name: PatternExpr* {where Constraints}

PatternExpr ::= Assignment | SubPattern

Assignment ::= `(' VariableName := SubPattern* `)'

SubPattern ::= Disjunct | CountedItem | PatternItem

Disjunct ::= PatternItem [ or PatternItem ]*

CountedItem ::= Number {x | or more | {or | -} Number } PatternItem

PatternItem ::= DefinedItem | Token | StringConstant

DefinedItem ::= VariableName | Delimiter

Delimiter ::= Punctuation | Bracketing | Separator | FormatChange

Token ::= Paragraph | Line | Word | Character

Constraints ::= TestExpr {and Constraints}

TestExpr ::= Test( Variable {, Variable}* ) = Value

Turvy

Uses and Users

Application domain: Generic

Intended users: End users

User Interaction

How does the user create, execute and modify programs?
The user performs the task by hand at least once, giving a verbal explanation if desired. Then Turvy tries to perform the task, asking the user for assistance and explanation when needed.
Turvy supports incremental development -- the user can modify the program as new cases arise.

Feedback about capabilities and inferences: Turvy announces when it first matches an action; describes search patterns; describes generalizations.

Inference

Inferencing:
Turvy creates programs that are sets of Find&Do pairs. For text-editing, it can find patterns that are chunks of styled text bounded by delimiters, and it can find a context based on nearby words.

Types of examples: Multiple examples, Negative examples.

Program constructs: Variables, loops, conditionals.

Knowledge

Types and sources of information:
Knowledge about forming procedures
Application-specific knowledge about objects (e.g. text can have an "Italics" property), forming search and replace patterns, and the types of arguments required by application events.

Implementation

Machine, language, size, date: Wizard of Oz experiment, 1990.

back to ...

Table of Contents

Watch What I Do

Chapter11

The Turvy Experience:Simulating an Instructible Interface

David Maulsby

Introduction

An Interface Idea

An Experiment

Inference Model

What Turvy learns

Procedures

Search and replace patterns for text editing

Interaction Model

Dialog structure

Messages

Experimental Design

Wizard of Oz method

Experimental setup

Prototyping process

Tasks

Hypotheses

Observations and Results

Dialog styles

Command set (Hypothesis 1)

TurvyTalk (Hypothesis 2)

Speech versus pointing (Hypothesis 3)

Tasks and teaching difficulty (Hypothesis 4)

General observations

Conclusions

Further work

Summary

Acknowledgments

Appendix: Formal Models

Procedures

Patterns

Turvy

Uses and Users

User Interaction

Inference

Knowledge

Implementation

Chapter
11

The Turvy Experience:
Simulating an Instructible
Interface