Context area exam: Bradley Rhodes

Question 1

Question: You've read about divided attention, theories, experiments to determine how it works and what is recalled on secondary channels. Summarize what you have learned (about a page), and based on these readings, devise a set of design criteria for multi-media user interfaces in which one medium is a primary channel and another a secondary, asynchronous channel. Explain how the theory and the experiments you have read about support these claims.

It is difficult to get a single view theory or viewpoint out of the divided attention (DA) readings. Not only are there a variety of theories being proposed, but each paper is experimenting on a slightly different aspect of the theories. To an outsider, the situation seems like the five blind men describing an elephant, only in this case some of the blind men have moved on to other animals in the zoo and are describing them instead. The readings can loosely be divided into papers trying to evaluate theories of divided attention and papers evaluating specific aspects of DA phenomena (e.g. the effects of DA on memory).

Many of the focus of attention theories are based on Broadbent's filter theory of attention. The general form of this theory asserts that focus of attention exists because of a limited cognitive resource that keeps us from processing all available information. To conserve this resource only crude physical characteristics of data are processed in parallel. After this early processing, only selected information is processed. Much of the debate within this paradigm is at what stage different features of our perception are "filtered out." The theories can be broken into the following categories:

Early-filter theories (Broadbent) state that the early stage of processing only looks at physical characteristics of a signal like pitch, color, location, etc. Features such as semantic category and combinations of several different features are only processed at later stages, and only in selected information. There is lots of experimental evidence to suggest that the strong early-filter theory is incorrect. For example, words in an irrelevant (distractor) channel are sometimes heard if they are contextually related to what is being said in a relevant (target) channel. Subjects will also sometimes process their own name in a distractor channel.
Late-filter theories, on the other hand, believe that parallel processing occurs up to and including semantic processing of information, but that only selected information gets to the conscious mind so as to aid inaction selection.
Attenuation filter theories (Treisman) are compromise theories that state that semantic information in an unattended channel can still be processed in parallel with an attended channel, but that this processing is somehow attenuated by the fact that it is unattended. [Wood 1995] supports this particular theory. She showed that subjects who remember backwards speech from a distractor channel tend to make more shadowing mistakes about 15 seconds after the backwards speech starts, indicating that the subjects had their attention drawn towards the distractor channel only after a build-up period. Her experiment also contradicts the theory that parallel processing can happen without distraction (as in the late-filter theory) since subjects that heard the backwards speech made more errors.
Predictive theories (Neisser) propose that attention is an active and predictive process, and that the more you know in advance about a to-be-attended channel the easier it is to select and attend to that stimulus without distraction. Barr's experiment suggests a combination of the predictive theory and a filtering (hierarchical) theory.
Behavioral coherence / univocal perceptual-motor control (Allport). Alan Allport proposes that all the filter theories are flawed in that they are based on the idea that filtering occurs to conserve resources, and that instead focus of attention serves the purpose of aiding action selection by keeping us focused on a single task. To perform this task, focus of attention separates channels out to avoid "crosstalk" between perceptual inputs in multiple tasks. This explains certain cross-modality effects that the limited resource theories can't. For example, copy-typists can both copy-type and verbally shadow at the same time with little difficulty. However, dictation-typists can't both dictation-type and read aloud at the same time, even with lots of practice. Allport asserts that this is because reading for letters (not content) and typing are coordinated input and output, as are audio input and output in the shadowing task. In the second case, the modalities for the two tasks have crossed input and output mental representations (audio in for dictation typing, audio out for reading aloud), and thus there is lots of room for crosstalk.

In spite of the fact that there is no agreed upon theory of attention, the experimental results of decades of experiments are still available and still can be used in designing systems even without a theoretical framework to explain them. We can be relatively sure that the following will affect the discriminability of two channels of information. For example, channels in different modalities are fairly easily distinguished. Modality might not just include the external form though. Allport also talks about mental representation of information. E.g. if a reader subvocalizes, that might produce crosstalk with an audio modality.

Within a modality, channels are more easily distinguished by physical characteristic than by semantic content. In audio this includes pitch, intensity, and location. In vision this includes location, color, luminance, etc. Barr reports that these physical characteristics also come with different discrimination capabilities. For example, the location of a sound is a better feature for discrimination than voice, voice is better than forward vs. reverse speech, and forward/reverse speech is better than semantic content.

Some of the papers specifically addressed DA and its effect on memory. These conclusions can be summarized as follows. DA at encoding time hurts memory performance later, but does not greatly affect a reaction time test. DA at recall hurts reaction time, but not memory. Change of task emphasis can affect encoding, but not recall. These results indicate that recall is somewhat automatic, while encoding is more under conscious control. A different paper suggests that direct memory is affected by DA, as is conceptually driven indirect memory tests (priming experiments), but data-driven (surface-level) indirect memory tests are not affected by DA.

There are many dimensions of just-in-time information applications that affect design criteria for multi-media UIs for a particular application, and certainly and essay of this essay can only give an incomplete overview. One set of dimensions is the importance and timeliness information being presented. These features represent the benefit of the secondary information, from a cost/benefit analysis standpoint. For example, the fact that there is a telephone call is expected to be of high importance. The information is also timely, in that if the user doesn't answer the phone within about 10 seconds the other end will hang up. The content of email might be just as important as a phone call, but the fact that email has arrived is usually less timely because one is usually not expected to answer email right away. The relevance of information is related to both importance and timeliness. For example, information letting you know that your current route has lots of traffic up ahead is both more timely and important because it is relevant to your current situation.

Importance and timeliness should be traded off with expected cost of interruption (interruptability) for the user. For example, a user sitting in a traffic jam might not mind getting almost any cell-phone call, but that same user giving a presentation probably only wants urgent calls sent to their cell phone or pager. Currently such mediation is done through a human agent such as a secretary. Similarly, you may not want to be interrupted with traffic information when you're busy dealing with a dangerous intersection. Interruptability can be equated with the performance of time-dependent primary tasks (such as navigating an intersection), or tasks where breaking one's concentration cause a "loss of place" such as writing a particularly difficult passage of a paper.

Certainly one design constraint is to limit the interruption of the user's primary task. If the information being conveyed is especially compatible with the user's primary task, such that there is almost no crosstalk, then there will be little interruption. For example, normal highway driving (a primarily visual and fairly automatic task) will not be overly interrupted by an audio alarm. In general, one wants to design the secondary message such that there is as little crosstalk as possible by making it as distinguishable from the primary channel as possible. Different physical characteristics such as modality, pitch, shape, color, and location all aid in distinguishing channels. Barr's experiments in segmenting target from distractor speech shows not only that these physical segmentation cues are valuable, but also that multiple cues can be combined to help even further. For example, location (the ear a message is played in) and pitch together are more effective than either individually. Allport also discussed how keeping the primary and secondary channel with different internal representations for the information also aids in limiting crosstalk. The experiments with copy-typists vs. dictation-typists demonstrates this principle.

Another way to limit interruption is to convey small amounts of new information, though that information might leverage on lots of pre-processed knowledge. For example, a pilot uses years of training to know what the "flaps not responding" warning light means. They therefor need only a little time to process the single bit of information conveyed -- the warning light. Information can also be represented in such a way that the action to be taken with the information is reduced from a cognitive to a sensory-motor task. Hutchins, Wickens, and Norman all describe such situations. For example, gauges in a nuclear reactor should be designed such that normal operating conditions have them all facing up, and the meters should be aligned vertically. In this way, the action of checking to see if things are out of whack is reduced to the sensory-motor task of seeing if the needles all form a single line.

In many cases, at least some of the information to be conveyed is too complicated to not interrupt the primary task, and at this point a trade-off is inevitable. One way to address this trade-off is by creating a "ramping interface" where information is conveyed in stages. Each stage of a ramping interface provides a little more information, at the cost of the user spending a little more of their attention to read and understand it. The idea is to get useful information to a user quickly, while at the same time allowing them to bail out on an unwanted suggestion with as little distraction as possible.

In a ramping interface, the user should always be able to get more information by going to the next stage, and each stage should give at least enough information to let the user know whether to go to the next stage or not. The action required to get to that stage should be proportional to the amount of information provided in the current stage. It should only require a simple action, such as moving the mouse to a target, for a user to go to early stages. Going to a later stage might require the user to pick a topic from a menu, trading off simplicity for increased control of what information is displayed.

An example of a bad ramping interface is the telephone. The first stage is silence, where the passive sensor on the phone waits for a call and the "agent" in the phone does nothing until a call is connected. Once a call is made, the next stage is entered, and the phone rings. The ring has the advantage of reaching the called party wherever they are in the house, whatever they are doing. It also has the social advantage that others know the person might need to deal with the phone, and it is hard to ignore, which is important since it alerts the user to a possibly important and certainly timely piece of information. However, the phone does not give the user enough information to determine whether to go to the next stage, answering the call. Only after you answer the phone do you know whether it's an emergency or a telemarketer. This flaw is starting to be addressed with caller ID and privacy enhancing phones that ask call-blocking people to speak their name before ringing the other end.

In the early stages of a ramping interface, the secondary channel should not only be easily distinguished between primary and secondary channel, but also between different kinds of secondary information. For example, on my screen different kind of zephyrs appear in different locations. I can tell the difference between zephyrs and other windows by the color of the window, but I can also tell whether the zephyr is a personal communication or a message to an instance (chat-room) by it's location. Instance zephyrs appear in the top corner of my screen, while personal ones are more central. Again, the information conveyed in early stages should be conveyed through primarily physical rather than semantic channels, as this kind of information is processed more readily and more in parallel with other channels.

Another important design issue is whether the secondary channel needs to be identified with something in the real world. For example, the margin notes system relates suggestions with specific sections of a webpage. In these cases you need an interface method that is conducive to that sort of linking. Augmented reality is one such method, as are more traditional visual interfaces where an icon is placed near or over the thing being annotated. Audio annotations are much harder to locate and associate with a specific object. On the other hand, other kinds of just-in-time information applications don't need to associate the secondary channel information with anything already in the primary channel. For example, a cell-phone message has nothing to do with things in the receiver's personal space.

A final design constraint that was hinted at earlier is the social aspects and needs of an application. Some interfaces are used in solitude and have no social considerations. Others, like telephones, need some way to let others know what the user is reacting to. For example, an important feature of the cell-phone is that it tells other people that you have to excuse yourself to address someone else who isn't physically present. On the other hand, it may be a privacy violation to let everyone in the room know who is calling. One compromise is to simply "ring" for the room at large, but the owner of the phone also hears who is calling, and perhaps what about.