Towards an embodied psychology for digital agents
      
Interactive 3d media is an undeniably powerful cultural phenomenon. While the technology has grown at a breakneck rate, the aesthetic and thematic frameworks supporting it have been much slower to mature. There are a number of reasons for this- some are technical, with personal computers only recently having achieved the necessary capabilities for compelling real-time content, and some are economic- it requires so much capital in time and resources that (similar to mass-market movies) only large (corporate) organizations have been able to produce content. But changes are coming quickly on both fronts, as computer hardware gets cheaper and more powerful, and the tools become more efficient and more readily accessible. But it is not enough to merely move the creation of derivative, violent, shallow content out of the hands of the big companies and into the garages- the content itself must change (Littlejohn, 03).
      
While the changes that need to happen to move from interactive action movies to thematically rich interactive media are numerous and varied, many of them can be viewed as problems of generating complexity. From creating large, immersive, and non-repeating environments to embellishing character dialog and plots, the issue of complexity arises at every stage of content generation. If we accept that increased complexity is needed to push the medium of interactive 3d environments forward, we are left with the double-pronged question of:
- how we define complexity, and
- how to achieve it in real applications.
      
While the issue of complexity is itself quite complex, a clear definition can be arrived at by constraining the realm of discourse. In the context of creating interactive environments, complexity is the extent to which the system is deep (there is a great deal of material to explore) and engaging (the material the user explores is interesting on multiple levels). When it comes to actually producing complex content, the range of approaches can be seen as lying of a spectrum of designer specification vs programmatic generation. Stated another way, a creator of a digital environment can either specify everything (from polygons to dialog) by hand, or devise algorithmic approaches to create content in his/her stead. The hand-crafted approach has the benefit of being a tried and true method of creating deeply moving content, as it is the way that every important narrative so far has been created. However, when interactivity is added, the strictly handmade approach reveals its limitations. It may be profound, but it is also profoundly brittle, as content that is made in advance clearly has no way of responding to user interaction. In order to respond to the user at all, there must be some aspect of algorithmic control and selection, if not outright generation. Therefore, every successful interactive structure relies both on cognition and code. But using the ever-increasing power of the computer to merely select from a finite set of options imposes a needless limitation on the possible approaches. As developers strive for greater scope on both the macro (large worlds) and micro (high degree of detail) levels, the need to off-load work from designers to computers becomes increasingly more valid and vital. The question should never be whether or not to surrender control to the computer- that is built in to the nature of digital media, but rather exactly how and when to surrender control in order to allow the algorithmic approaches to do what their best at (combining and permuting large numbers of items), and reserve the energy of the designer for the more difficult and meta-level tasks.
      
Once one decides to embark on the path towards a hybrid approach combining authorship and algorithms, one finds artificial intelligence creeping into the discussion from all sides. All philosophical questions aside, artificial intelligence research can be viewed as the area of computer science most concerned with the creation of complex systems, in its attempts to model human cognition. While most of the research into AI for interactive media (including this paper) seeks to illuminate methods for creating believable agents, the usefulness of AI techniques in all aspects of content generation should not be overlooked.
There is an important distinction that can be made between what could be termed "academic AI", or research that is geared towards explanation, and application-based AI that is concerned exclusively with end results (Tozour, 02) . While the two approaches certainly feed into one another, application-based AI is more directly relevant to interactive media. No matter what other goals the work may have, they can only be affected to the extent that the user perceives them and as such, the only aspects that ultimately matter are those that are directly perceivable by the user.
      
So it would seem that one could avoid the much of the theoretical issues and merely focus on building the system. However, while the processes of academic and application AI differ, their goals are highly isomorphic. While academic AI strives to create artifacts that are believable as loci of cognition, application AI in its quest to produce believable and interesting content, attempts to create artifacts that are believable as products of a cognizant mind. Though the differences inherent in producing a system that works and is stable and engaging in computational experimentation are important, this similarity in end goals should cause us to continue to examine the issues raised in academic AI in the context of applications.
      
As a key aspect of much of the academic discourse surrounding artificial intelligence, the issue of embodiment is an important one to examine in relationship to digital agents. Digital agents are embodied to a high degree by definition, as they are not only made of the same base material as their environment, but they are also governed by exactly the same processes as those that govern that environment (algorithms implemented in a formal language). It could be argued that the same could be said of us live agents, as we too are governed by the same sort of physical and chemical processes that underlie all other physical events. However, to resolve such disparate phenomenon as say, cognition and gravity, to the same causal factors, one must abstract them to the point of irrelevance. In the virtual world however, all processes are the results of varying sub-routines of a larger system- both physics and metaphysics resolve to the same processes- the manipulation of bits. As a further result of being “at one” with their larger environment, digital agents have access to potentially perfect knowledge of all aspects of the world. Since there is no clear distinction between the agent and the system that houses it, digital agents could be seen as the very embodiment of embodiment.
      
However, while an agent's behavior may be explicitly tied to states of objects in the world, the mapping of world-states to agent actions is usually hard-coded. Though implemented via code that is dynamic at runtime, the meta-level rules governing the mapping are the result of design decisions and are themselves generally hard-coded and static. This leads to a character that is incapable of change. This is fine for many, if not most situations, but it points out an area that is ripe for change, and could lead to much more believable and engaging characters.
      
A system that transcends the above limitation would be one that relies on what could be called an embodied psychology for its agents. Before going further in the explanation of such a system, it is necessary to define what psychology means for digital agents. All agents, from Pac-Man ghosts to Black&White's creatures, have some mapping of states in the world on to actions they take. While the range of approaches as to what exactly happens in the interim is enormous, when viewed from the position of a user interacting with the system, all of them can be seen as the agent's psychology or belief system. The human tendency to anthropomorphize causes users to construct narrative, personality-based explanations of even the most simple behaviors. The choice to refer the mapping of world states onto agent actions as a psychology is not meant to imply that we are any where near achieving human-level thought, or even that such an achievement is necessary for the proposed system. Instead, it is meant to underscore the extent to which the system should ultimately be grounded on thematic, rather than mathematical, premises.
      
By making the “beliefs” of the agent, as well as its actions, grounded in specific aspects of the world it inhabits, the meta-level behavior of the agent is left open to change in response to changes in the world. In this sense, the psychology of the agent can be said to be itself embodied. This has the potential to result in characters that can play multiple roles over the course of an interactive experience, and that can change roles in a way that responds in a deep manner to player/user interaction. Players could convince and convert agents instead of merely coercing them. Also, to borrow once again from academic AI discourse, embodied psychology (EP) represents a distributed, bottom-up approach to plot creation that stands in contrast to the largely top-down approaches of even the most inventive systems currently in development (the Facade project by Mateas et. al. offers a very good example of one such system). EP is not meant to replace such systems, but rather to augment them. It is likely that the interactive narratives of the future will employ several different approaches in different situations and at different scales. Embodied psychology can be seen as a system for setting the moods of the character, but doesn't deal directly with how those moods are implemented. As such, there needs to be a secondary system that works in tandem with EP. Using multiple layers also helps with scalability and implementation, as it separates potentially computationally expensive EP from a possibly simpler system. The most obvious choice would seem to be a finite state system with EP inserting itself in the transitions between states.
Non-AI requirements for the system
      
Before we can fully describe the manner in which an EP system might operate, it is useful to describe the additional requirements of such a system. In order to make the implementation of a virtual psychology a valid and worthwhile effort, agents must possess the following attributes
knowledge of the world
a way of recognizing and interpreting actions on the part of the user
a way of taking action in the world in keeping with the simulated psychology
The first requirement resolves to establishing a scope, rather than a collection of facts. Since characters have access to completely accurate information about every aspect of the world they inhabit by virtue of being digital, the choices lie in deciding not what they know, but what they don't. Care must be taken to ensure that agents do not appear omniscient. The second requirement is primarily a parsing problem- the actions of the user must be translated into statements that are meaningful in the formalism used to represent the agent's psychology. Agents can't reason based on things they don't understand. Finally, without a way of shaping their actions based on their simulated belief structure, the agent seems no more responsive than its programmed-by-rote cousins. Furthermore, the system must allow not only for actions to be decided on (as with other approaches to AI), but must also allow for distortions and incorrect, biased decisions. The issue of affective reasoning has been explored in the past, with Ableson's work on “hot cognition” being a notable example (Boden, 77), but the systems created have been static and unchangeable- doomed to their own dogma.
Requirements of an idealized EP system
      
So if we posit the fulfillment of the above requirements, we are left with a system that is complete except for the psychology itself- a body without a brain. The next step is to outline the desired traits that the simulated psychology should have- the ideal system should be:
- Knowable
- Grounded
- Dynamic
      
The system should be knowable in that it should be possible for an astute user, given a reasonable amount of time with the system, to arrive at an understanding of how the agent functions that falls within a relatively small margin of error of how the agent actually does function. That margin of error must be cultivated and kept within a desired range- the user should never be able to perfectly predict the agent's actions, and neither should they feel that they have no idea as to what the agent's next action will be. To allow either would undermine the complexity and interest of the system. In addition to modeling reasoning, the reasons for the agent's actions must be grounded, meaning that they are tied to concrete aspects of the environment in which the agent operates. This amounts to every action taken by the agent having an identifiable reason, or set of reasons for which it occurs. It is important to note that the reasons need not be good reasons, as one of the benefits of modeling an agent's psychology is that it allows for distortions and biases to affect the way the agent behaves.
This is merely a re-statement of the condition of embodiment, and leads directly into the third requirement. Once the meta-level behavior of the agent is linked to world-states, it inevitably becomes dynamic, changing in response to changes in the environment. If the system is grounded but not dynamic, it would amount to simply a more complex way of hard-wiring behavior into an agent. Once we accept that the simulated psychology of the agent must be open to change, we are faced with the problem of creating a mechanism to control the manner in which changes occur. There are any number of ways to go about this, but the ideal system should have a “beliefs” that span a range in their resistance to change, as well as striving to maintain consistency, both internally and with respect to the environment. Also, since digital characters and environments are inevitably implemented via programmatic means, the system must be quantifiable and represent some sort of formalism. However, the kind of behavior we are trying to capture is notoriously hard to formalize.
      
The desire to represent reasoning and belief via formalized means is not a new one, however, and much work has already been done. A particularly relevant realm of academic discourse is to be found in research into defeasible reasoning. Defeasible, or nonmonotonic reasoning is an area of logic that deals with the fact that
“...we typically reason not using the ironclad methods of first-order logic, but... the conclusions we draw can typically be defeated if we obtain new information that contradicts or undermines the arguments we used to draw the conclusions in the first place” (Ginsberg, 93)
      
A classic example of defeasible reasoning is that, if told that Tweety is a bird, we should be allowed to assume that Tweety flies. However, if we are later told that Tweety is a penguin, we would want to be able to retract our previous belief that he flies. Defeasible reasoning is a class of formalisms specifically geared towards handling these kinds of situations. It is therefore ideally suited to use in belief representation, as it is built for both dynamic change and the handling of partial knowledge.
      
One of the most powerful nonmonotonic formalisms is the default logic of R. Reiter and others (Reiter, 80). In default logic, we are concerned with so-called default theories that split knowledge of the world into two sets of items: facts that the agent knows, and “defaults”, which can be seen as representing conditions under which the agent will take a statement to be true. The set of facts is merely that- statements in some variety of first-order logical formalism. The defaults, however are more complex and consist of three parts- the prerequisite, the justification, and the consequent. They are generally represented as:
A : B
C
      
Where A is the prerequisite, B the justification, and C the consequent. What the above formula means is that if A is known to be true, and it is possible that B may be true, then C can be taken to be true. It is similar to saying “if A, then C”, but with B inserted as a kind of additional check placed on the system. This amounts to a powerful and flexible way of formalizing webs of beliefs, with facts resting on other facts for their justifications. However, problems are encountered in implementing default logic due to the computationally intractable nature of checking new information against all the defaults in the database. This might not seem to be a problem, but given that there might be defaults that justify the proposed fact (F), as well as those that contradict it (defaults that have either F or not F as their consequent), and that the fact may invalidate other, previously held defaults (F may be equal to not B for some default), the problem can easily become insurmountable. These problems can easily be overcome if the defaults are given an order in which they are checked. However, that merely trades one hard problem for another as it suggests the need for an automated way of determining the importance of a given default. This remains a problem only to the extent that one is set on defining a fully automated system capable of running without any human intervention. If some degree of designer specification is allowed, however, the problem can be resolved, and a deep and complex system can be formulated that is capable of running in real-time.
A proposed modification to default logic for the creation of EP
      
I propose to solve this problem by adding a “valence” value to each statement and default in the system that represents the extent to which the agent believes in the given fact, or agrees with the line or reasoning sketched out by the default. Every statement and default in the system is given a value in between -1 and 1 that represents the degree to which the agent believes in the given statement or default. Negative statements are not allowed. Rather, the positive version of the statement is added with a negative valence. The valence of a default is arrived at based on the valences of its component statements.
      
The system is initialized with a collection of facts it knows and defaults. While there may be some facts that are not the consequents of defaults, they should be kept to a minimum, and only represent things that the agent should not be critical about (things like gravity, the existence of a concrete objective reality, etc). Everything else- the facts/ beliefs that are specific to a given agent, should all be tied to defaults, opening them up to the possibility of change. The seeding of the character with initial sets of facts and defaults, and the initial valence values of them, is the way in which a specific personality is modeled.
      
At runtime, the agent must be able to both affect actions that are based on the things that it believes about the world, as well as remaining open to suggestions proposed by either the interaction of the user or changes in the environment. Given that the system is posited to have a way of parsing new information into statements in its formal language, the agent can then respond to the new information based on the facts and defaults in its database of beliefs.
The inclusion of valence values for the defaults and statements makes this a straightforward mathematical issue, rather than an intractable philosophical one. When a new fact (F) is proposed to the system, the set of defaults is ordered based on the absolute value of their valence values, and is stepped through until one is found that has a consequent that either justifies F or conflicts with it. By using the absolute value of the valences of defaults, rather than the raw numbers, defaults that have a highly negative value are allowed to trump those that have a smaller positive value. This amounts to the system being able to reject things that it feels strongly are not true (a default with a highly negative valence exists), even if there is a line of reasoning it is otherwise willing to accept that justify them (a default with a small positive valence). The valence of F is then calculated based on the valence of that default, and if it is nonzero, F is added to the list of facts. The most obvious way of assigning a valence value to the newly added fact would be to simply give it the same value as the default used to add it, but this may prove too simple- experimentation is needed to determine the ideal relationship.
      
So far we've sketched out a way that the system can acquire new beliefs, but we have yet to describe the way in which this results in changes to previously-held beliefs. In order for that to occur, there must be a methodology for back-propagation of valence through the system upon the addition of new information. Once a fact is added to the system, the valences of everything checked in the process of adding the fact should be adjusted slightly based on the valence of the added item. If a default is consistently resulting assigning negative valences to incoming info, it's valence should be reduced. Likewise, defaults that result in assigning positive valences should have their valences raised slightly. This would have the effect of making the system strive to eliminate conflict between its beliefs and the external world, as beliefs that consistently result in the system denying information that it is presented with (assigning negative defaults) should be viewed by the system with increasing skepticism (have their valence values lowered), and beliefs that are consistently validated by new information(result in positive valences) should be seen as increasingly valid. Note that there is only conflict insofar as the agent's reasons are unjustifiable. The agent is still free to link true statements and statements with unknown truth states in any way it chooses, with the possibility of faulty and affective reasoning.
      
By combining the power of default logic to base the actions of an agent on reasons in a simulated belief system with a way for those beliefs to respond to actions of other agents (including the user) and changes in the environment, a closed loop is formed. By linking the system to itself, mediated by the state of the world in which it operates, a greater degree of complexity can be reached, and deeper content developed.
Conclusion/ Further Study
      
Many problems exist that must be confronted in order to achieve the goal of deep, philosophically interesting interactive 3d environments and narratives. In this paper, I addressed one small component of the problems that seems to be often overlooked in the research- the need to make the means by which agents make decisions both dynamic and based on the nature of the world they inhabit, rather than hard-wired into them as the implementation of abstract methodologies. Nonmonotonic formalisms, especially the default logic of Reiter, are rich in their potential to help create such a system, but it is yet to be seen whether they could lead to results that were both compelling and capable of real-time performance. Ultimately, these issues must be resolved not by theory, but in practice, by implementing them in next-generation digital agents.
Bibliography
Antonelli, Aldo Consequence Relations for Defeasible Logic University of California, Irvine, 2003
Boden, Margaret Artificial Intelligence and Natural Man (New York: Basic Books 1977)
Ginsberg, Matt Essentials of Artificial Intelligence (San Francisco: Morgan Kauffman 1993)
Littlejohn, Randy Agitating for Dramatic Change (gamasutra.com, 2003)
Mateas, Micheal Facade: An Experiment in Building a Fully-Realized Interactive Drama 2002
Mateas, Micheal Expressive AI: a hybrid art and science practice (Leonardo: Journal of the International Society for Arts, Sciences, and Technology 34(2), 2001)
Reiter, R, A Logic for Default Reasoning 1980, as reprinted in:
Ginsberg, Matt Readings in Nonmonotonic Reasoning (Los Altos: Morgan Kauffman 1987)
Tozour, Paul The Evolution of Game AI, as appeared in:
Rabin, Steve AI Game Programming Wisdom (Hingham, Massachusetts: Charles River Media, 2002)