Smith, D.C., Cypher, A. & Spohrer, J. "KidSim: Programming Agents Without a Programming Language". In Communications of the ACM, 37(7), July 1994, pp. 54 - 67.
Copyright 1994 by the Association for Computing Machinery, Inc. Copying without fee is permitted provided that the copies are not made or distributed for direct commercial advantage and credit to the source is given. Abstracting with credit is permitted. For permission to republish write to: Director of Publications, Association for Computing Machinery. To copy otherwise, or republish, requires a fee and/or specific permission.
Software agents are our best hope during the 1990s for obtaining more power and utility from personal computers. Agents have the potential to actively participate in accomplishing tasks, rather than serving as passive tools as do today's applications. However, people do not want generic agents--they want help with their jobs, their tasks, their goals. Agents must be flexible enough to be tailored to each individual. The most flexible way to tailor a software entity is to program it. The problem is that programming is too difficult for most people today. Consider:
First, observe that most people can follow a recipe, give directions, make up stories, imagine situations, plan trips--mental activities similar to those involved in programming. It seems well within the capacity of humans to construct and understand concepts like sequences (first add rice, then add salt), conditionals (if the water boils too fast, turn down the heat), and variables (double each quantity to serve eight).
Can we make programming as easy as giving directions?
Second, notice that most people can use personal computers. Today, over 100 million people use them to write letters and reports, draw pictures, keep budgets, maintain address lists, access data bases, experiment with financial models, play games, and so forth. Children as young as two years old can use a mouse and paint with programs like KidPix (a child's painting program, at one time the world's best selling application) or explore worlds like The PlayRoom (a child's adventure game). So computers are not inherently unusable. The key observation is that most of these applications are editors: with them, users produce an artifact by invoking a sequence of actions and examining their effects. When the artifact is the way they want it, they stop.
Can we make programming as easy as editing?
Let us define the term "end users" to mean people who use computers but who are not professional programmers. Such people are typically skilled in some job function, but most have never taken a computer course. They use programs ("applications") written by other people. They can't modify these programs unless the designer explicitly built in such modification, and then the modification is typically limited to setting preferences. Perhaps 99% of the hundred million computer users can be classified as "end users." If we could empower these people to program computers, the impact would be enormous.
In the past two decades there have been numerous attempts to develop a language for end users [21]: Basic, Logo, Smalltalk, Pascal, HyperTalk, Boxer [7], Playground [8], etc. All have made progress in expanding the number of people who can program. Yet as a percentage of computer users, this number is still abysmally small. Consider children trying to learn programming. When they are in class, most children will learn anything. But do they continue to program after the class ends? Today programming classes are characterized by the "whew, I'm glad that's over!" syndrome. As soon as children do not have to do it anymore, they go on to something that is actually fun.
We hypothesize that fewer than 10% of the children who are taught programming continue to program after the class ends. This is based on personal experience and observation. Surprisingly there are no published studies on this issue, to our best knowledge. Nevertheless, we expect that most readers will agree with this hypothesis. Elliot Soloway states "My guess is that the number ... is less than 1%!!! Who in their right mind would use those languages--any of them--after a class??"[1] Single digit percentiles indicate that the end-user programming problem has not yet been solved.
As a step towards solving this problem, we will describe a prototype system designed to allow children to program agents in the context of simulated microworlds. Our approach is to apply the good user interface (UI) principles developed during the 1980s for personal computer applications to the process of programming. The key idea is to combine two powerful techniques--graphical rewrite rules and programming by demonstration. The combination appears to provide a major improvement in end users' ability to program agents.
But if we do, what do we use instead? The answer is all around us in the form of personal computers. Today, all successful personal computer applications and many workstation applications follow certain human-computer interface principles that were developed in the late 1970s [20] and codified during the 1980s [1]. The most common embodiment of these principles is the so-called graphical user interface (GUI) consisting of windows, menus, icons, the mouse, and so forth. The principles that make this interface work can and should guide computer scientists in attacking the end-user programming problem. We will briefly describe a few of these principles. However, we want to emphasize that we did not invent these principles in the work reported here. We are merely applying them to programming. Furthermore, the description here is by no means complete; many books have been written on these principles. See, for example, [2, 11, 12].
The following are the most important principles for solving the end-user programming problem.
Actually, some programming systems have adopted an editing interface, and they are beginning to broaden the community of programmers. Spreadsheets, the most widely used programming technology, have done this for years. The popularity of some user interface management systems with their "drag-and-drop" interface builders is a result of their allowing direct manipulation of interface elements. Similarly, most people can construct buttons and fields in HyperCard, which has an editing feel, but few of those same people can program in HyperTalk.
There are a few brilliant examples of programming systems that have applied all of these principles. Our favorite is Bill Budge's video game for personal computers called "The Pinball Construction Set." It allows people to program pinball games by directly editing the layouts, i.e. by dragging and dropping pinball elements such as flippers and bumpers. The elements begin functioning as soon as they are dropped into place. Everyone can create pinball games this way. We call this "programming by direct manipulation," and when done well, it is wonderfully successful. The problem with The Pinball Construction Set is that you can only program pinball games with it. The challenge is to increase the generality without losing the ease of use.
Why simulations? They are a powerful tool for education. Simulations encourage unstructured exploratory learning. They allow children to construct things, supporting the constructivist approach to education. Alan Kay contends "We build things not just to have them, but to learn about them." He quotes the philosopher Cesare Pavese: "To know the world, one must construct it." Scardamalia [17] argues that children learn best when constucting things. They enter Vygotsky's "zone of proximal development." Simulations such as SimCity and SimEarth allow children (of all ages) to construct unique microworlds, giving them a sense of ownership in their creations. They can observe and modify and experiment with these microworlds. Children are the "gods" of their worlds. This pride of ownership and feeling of power are compelling qualities that motivate even professional programmers.
However, most simulations today do not permit users to modify their fundamental behaviors and assumptions. For example, one cannot alter the fact that if one puts in a railroad in SimCity, the pollution problems go away--not exactly a realistic consequence. This inflexibility is the reason that most school teachers do not use SimCity as a teaching tool, even if they are studying city building. It does not model what they want to communicate. Simulations that do allow fundamental modifiability, such as numeric simulations built with Stella, require extensive programming skills. Few children or teachers can or want to do it.
What is needed is a way for children without programming knowledge to have more control over the behavior of simulations. What is also needed is a way for teachers to tailor simulations to support their curriculum goals. KidSim(TM) provides a way to do both.
The game board represents the simulation microworld. It is the environment in which simulation objects interact with one another. Dividing it into discrete squares makes it easier for kids to communicate their intentions to the computer. The game board shown in Figure 1 displays a monkey in a simple jungle scene. We will use this simulation throughout this article.
The clock starts and stops a simulation running. Dividing time into discrete ticks makes it easy for kids to control their simulations. The clock provides both fine grain control over time (single stepping) and the ability to run time backward. Running the clock backward undoes everything that happened during the previous tick, encouraging kids to experiment and take chances. If something goes wrong, they can just back up the clock to before that point.
The copy box is a container for simulation objects that automatically makes copies of things inside it. Whenever a child drags an object out of the copy box, the system clones the object and puts the original back. This provides an infinite source of new objects. Kids can place their own objects in the copy box, allowing them to infinitely duplicate their own objects as well.
Let us define an agent as a persistent software entity dedicated to a specific purpose. "Persistent" distinguishes agents from subroutines; agents have their own ideas about how to accomplish tasks, their own agendas. "Specific purpose" distinguishes them from entire multifunction applications; agents are typically much smaller. (As demonstrated by the articles in this issue, this is by no means a universally accepted definition, but it is the one we will use here.)
In KidSim, the active objects in simulations are agents. During each clock tick, agents move around on the game board interacting with one another. Metaphorically they are characters in a microworld, and we will use the terms "character" and "agent" interchangeably. KidSim agents have three attributes:
KidSim agents are similar to those in Logo Microworlds. The difference is in how they are programmed. In Logo Microworlds, kids program objects with Logo. In KidSim, kids construct "graphical rewrite rules."
In KidSim, kids usually start with several predefined characters in various microworlds. The kids can play with these microworlds immediately, as with ordinary video games. This gets them involved. After a while, they typically want something to work differently. At this point KidSim differs from video games. Kids can modify the way the characters work by changing their programming. Pedagogically this is an important difference, because the act of changing things forces kids to think. They have to decide what to change, how to change it, and how to fix it when their changes do not work. At every step their brains are engaged. In fact, we believe that any video game can be turned into a learning experience by allowing kids to modify it.
We also give kids a "lump of clay" from which they can create new characters, indeed entire worlds. The lump of clay is sufficient to build everything. Typically kids begin by modifying the predefined characters, but they quickly move on to defining totally new ones.
Figure 3. A graphical rewrite rule
A rule is said to match if its "before" part is the same as some area of the game board at some moment in time. When a rule matches, KidSim transforms the region of the game board that matched to the scene in the "after" part of the rule. (Actually a recorded program is executed, as described later.)
Rewrite rules or "if-then rules" or "production systems" are well known in Artificial Intelligence [5, 14, 16]. They form the control structure for expert systems, of which OPS5 from Carnegie-Mellon is an example [13]. Rule-based systems have some marvelous characteristics. Since rules are independent of one another, it is possible to add a rule to an existing system without affecting the rules that are already there. This assumes that the added rule is specific enough so that it doen not override other rules and that the system is smart enough to factor the rule into the correct order. The Lisp70 production system automatically factored rules using an algorithm called "specificity" in which the more specific rules were tried before the more general ones, which worked well in most cases [22]. Furthermore, in rule-based systems it is easy to understand and debug each rule by itself, without having to be concerned with the other rules. Of course a good rule tracer and stepper are essential, as in any programming language.
Graphical rewrite rules are two dimensional versions of rewrite rules. They too have been applied to the end-user programming problem by several researchers [9, 15]. (See also A.C. Kay, Tableau, 1988, unpublished.) While they work well for simple tasks, they have encountered two problems that have limited their utility for complex tasks: (a) The "rule-generality" problem--pictures, being inherently literal, are hard to generalize to apply to multiple situations. (b) The "rule-semantics" problem--it is difficult to specify how the computer is to perform the transformation from the left to the right side of a rule. Some systems have applied AI techniques to try to infer the transformation, but to date no one has developed a general method for doing so. Additionally, graphical rewrite rules suffer from a problem that all rule-based systems have, graphical or not: (c) The "rule-sequencing" problem--it is difficult to specify a series of transformations, i.e. do rule A then rule B then rule C, since rules by definition are independent of each other. KidSim's graphical rewrite rules solve the first problem by abstraction and the second by programming by demonstration. We have not yet addressed the third problem, sequencing.
Children may generalize KidSim's graphical rewrite rules in two ways:
In this example, a child has clicked on a rock in the "before" (left) part of a rule. Its list of possible generalizations appears: "this particular rock (grey rock 7), any grey rock, any rock, or any object." The child may specify that the rule is to apply to any of these types of objects.
If a child buttons down on the < symbol, a pop-up menu of operators appears showing the allowable tests on numeric properties (< <= = != >= >). Text properties have other operators. A child may choose any operator.
Now we can fully define a graphical rewrite rule in KidSim:
There have been a number of programming by demonstration (PBD) systems prior to KidSim. The major ones are described in [3]. These systems have proved to be exceptionally easy for people. However, most PBD systems to date have suffered from two deficiencies:
1. The child sets up the simulation situation which he or she wants to affect. In this example, the child places the monkey next to a rock. KidSim allows children to define rules only when the actual simulation situation exists. This makes defining rules a concrete process, reducing the need to visualize simulation states abstractly. The child can be sure that the rule will work at least for this one example, and the child can generalize it to a wider class of situations later.
2. The child specifies the region of the game board with which the rule is to deal (Figure 6). This is the region that will be pattern matched against the game board when the simulation runs. The child specifies this region by direct manipulation, by dragging the border of a "spotlight" which appears during recording. The "before" and "after" pictures in the rule copy the "spotlight's" area.
3. Initially the "before" and "after" parts of a rule are identical, i.e. each rule begins as an identity transformation. The child defines the rule semantics by editing the "after" picture to produce a new simulation state:
First the child places the cursor (a small hand) over the monkey and drags it to the square above the rock:
Then the child drags it to the square to the right of the rock:
Done. That is all there is to it. Nowhere did the child have to type "begin...end", "if...then...else", semicolons, or other language syntax. Yet the effect when executed is that the monkey jumps over the rock. The child has programmed the monkey. This is the essence of programming in KidSim: programming by direct manipulation editing.
Suppose now that the child wants to restrict the monkey to climbing over rocks that are up to twice its height (monkeys being good climbers) but no higher. Suppose the monkey's height is 60, and the height of the current rock is 70. Here is how to do it.
4. The child clicks on the triangle below the left side of the rule (Figure 10). This displays a box in which property tests may be defined. KidSim always provides an empty test.
5. The child drags the height property from the rock's viewer into the left side of the test.
6. Since the right side of the test is to contain a calculation, the child displays the KidSim calculator:
The child drags the monkey's height property into the calculator display, pushes the multiplication button and the 2 button, then pushes the = button. 120 appears in the display. The child drags this value into the right side of the property test. The resulting rule is shown in Figure 12. Since 70 (the height of the rock) is less than 120 (twice the monkey's height), this rock passes the test. However, other rocks the monkey encounters in its travels may be higher than 120, so this rule would not match, and the monkey could not climb over those rocks.
7. Finally the child closes the rule editor window. A miniature image of the rule is placed in the monkey's viewer at the top of its list of rules. This image visually suggests its behavior:
KidSim can display (upon request) the program that was built as the child edited the right side of the rule:
These are the actions that were recorded "by demonstration." Now we can define what it means to execute a graphical rewrite rule:
Here is where combining graphical rewrite rules and programming by demonstration results in a system that is stronger than either. Graphical rewrite rules solve the PBD representation problem, and programming by demonstration solves the rule-semantics problem.
A problem with rule-based systems is that rule order is crucial and often hard to get right. The problem grows with the number of rules. This problem is somewhat mitigated in KidSim because its rules are quite high level. We have found that we can accomplish interesting tasks in relatively few rules. For example, an optimized strategy for playing the game "MasterMind" requires only about 15 rules in KidSim. Rules can be grouped into subroutines, thereby forming larger conceptual chunks. Nevertheless, this problem could become serious when the number of rules gets large. We may adopt a strategy like Lisp70's in which rules are automatically factored into a discrimination tree by specificity, which removes the need for children to manually order the rules.
Of course, graphical rewrite rules really do constitute a programming language. The language has a syntax: left side -> right side, and it has an ordering of "statements": top to bottom. Nevertheless, we feel justified in calling KidSim "languageless programming" because of the complete absence of a traditional linguistic syntax such as if-then-else, and because the left and right sides of rules are images of the game board, not abstract representations of it. Furthermore, KidSim follows all of the UI principles listed above, making it feel more like direct manipulation editing than programming.
This association with the schools has been essential in the development of KidSim. If you want to design a program for children, then children must participate in the design. Feedback from the kids has caused us to change the design of KidSim several times. Each time we went back to the children for their reaction to the new design, which often caused us to revise it further. An example was our approach to specifying arithmetic expressions on property values, such as computing twice the monkey's height. We invented several clever (we thought) notations, most having a data flow flavor. The kids repeatedly said they could not understand them, much less write them. Finally we went back to the principle of direct manipulation. We introduced a calculator to allow interactive creation of expressions, rather than forcing kids to type them statically. Kids drag property values to the calculator and push buttons to operate on them, much as they would with a physical calculator. The calculator metaphor obeys almost all of the good UI principles mentioned earlier--concrete, interactive, direct manipulation, seeing and pointing, familiar conceptual model, and modeless. We found that all the children could use it. (A calculator "tape" is available for displaying the steps should a child want to see them.)
Having a working prototype is also essential in getting feedback from children. They are able to respond more easily when they can try out a design rather than having to imagine how it might work. But even before we got a prototype working, we tested the ability of children to write graphical rewrite rules via "Post-It Notes programming," in which they wrote rules on note pads and then acted out their "programs." Among the 30 fifth graders (ten-year-olds) we tested, both boys and girls, none had any trouble writing rules. Furthermore, they enthusiastically responded to the concept. When we gave them new problems, they raced back to their desks, scribbled out a new rule or two, then raced back to us and demanded "Test us now!" There was no writer's block as is often observed with programming languages, in which kids do not know how to proceed. This experience has been repeated with the computerized rules.
These efforts do not constitute a formal test of KidSim. Nevertheless, the results are so positive that we are encouraged to think that the KidSim approach has promise. At the time of this writing, we are planning to initiate a structured test on 60 children in the two fifth-grade classrooms. The teachers in these classrooms have developed a curriculum around a particular simulation based on [6].
Ultimately we want to extend KidSim to adult programming tasks (AdultSim?). At the moment we do not know how to do this, and we suspect that the effort required will be nontrivial. However, we do feel that we can characterize the result: all successful end-user programming systems for adults (or kids) will follow the UI principles we have described.