· 6 years ago · Sep 15, 2019, 05:44 AM
19 Attention and the Assessment of Mental Workload
2Research on attention underwent a major revival beginning in Great Britain in the 1950s. This revival, which has continued and grown to the present day, . . . took place in a number of laboratories, and much of the research was motivated by practical interest in aviation and other fields where people are often subjected to sensory and cognitive overload.
3—H. Pashler (1998a)
4INTRODUCTION
5Driving a car is a complex task composed of many subtasks, each of which must be performed at the appropriate times and under certain speed and accuracy requirements. For example, you must decide where you want to go and the route you want to take to get there, and then you have to navigate the car toward your intended destination. You must steer the car so that it stays in the desired lane, and use the gas and brake pedals to maintain proper speed. You have to see, read, and comprehend the information signs located along the roadway and modify your driving accordingly. You may find that you need to change the settings of entertainment and air-conditioning systems within the passenger compartment, and operate the turn signal and windshield wipers. All the while, you must monitor the environment continuously for unexpected events, such as obstacles appearing in the roadway or approaching emergency vehicles that require rapid reactions on your part. Although it is not part of the task of driving, you may frequently engage in conversations with other passengers or use your cell phone.
6Because of the many perceptual, cognitive, and motor demands that driving imposes on a driver, the task of driving incorporates almost all the aspects of attention that will be of concern in this chapter. It should come as no surprise that a lot of applied research on attention is devoted to studying the performance of drivers of land, air, and water vehicles under different cognitive demands.
7Historically, what we call ‘‘attention’’ has been of interest since Aristotle’s time. Research on attention began in earnest in the last half of the nineteenth century and the early twentieth century, as William James’ (1890=1950) quote indicated in Chapter 1. Much of this early work focused on the role of attention in determining the contents of conscious awareness. In part because of attention’s reliance on unseen mental events, and in part because of a lack of theoretical concepts for depicting the mechanisms of attention, the study of attention received less emphasis during the period from 1910 to 1950. However, as Lovie’s (1983) survey of publications shows, research on attention never ceased entirely during this period and several important contributions to our present understanding were made (see also Johnson & Proctor, 2004, Chapter 1).
8Our ability to attend to stimuli is limited, and where we direct attention will determine how well we perceive, remember, and act on information. Information or objects that do not receive attention usually fall outside of our awareness and, hence, have little influence on our performance. Thus, an information display important for a task (like the fuel gauge while driving) may not be exploited if the system operator is not attending to it. However, when a single highly practiced response has
9229
10
11 been given to a stimulus many times in the past, attention is not needed for accurate or fast execution of the response. This means that highly familiar but irrelevant stimuli may interfere with and draw attention away from relevant stimuli that require attention. These and other attentional factors determine an operator’s level of performance for any assigned task.
12There are two kinds of attention. Selective attention determines our ability to focus on certain sources of information and ignore others: for example, you may often find yourself at a party or in a classroom where more than one person is talking at once, yet you will be able to listen to what only one speaker is saying. Divided attention determines our ability to do more than one thing at once, such as driving a car while simultaneously carrying on a conversation. No matter what kind of attention is being used to perform a task, to understand the conditions that make people perform better or worse we need to know the amount of mental effort that people are expending to perform the task. We call a task that requires considerable mental effort ‘‘attention demanding.’’ We also need to know what kind of executive control is being used. Executive control refers to the strategies a person adopts in different task environments to control the flow of information and task performance.
13The concept of mental effort is closely related to that of mental workload, which is an estimate of the cognitive demands of an operator’s duties. Many techniques for measuring and predicting workload in applied settings have been developed based on the methods and concepts derived from basic research on attention. In this chapter, we describe alternative models of attention, consider different aspects of attention in detail, and examine techniques for assessing mental workload.
14MODELS OF ATTENTION
15There are several useful models of attention, and each of them has generated research that has enhanced our understanding of attention. Because each model is concerned with explaining some different aspect of attention, it is important that you note exactly what aspect a model is focused on so that you can appreciate the situations that are appropriately characterized by each model.
16Figure 9.1 shows a hierarchical classification of attention models. The first distinction between models separates bottleneck from resource models. Bottleneck models specify a particular stage in the information processing sequence where the amount of information to which we can attend is limited. In contrast, resource models view attention as a limited-capacity resource that can be allocated to one or more tasks, rather than as a fixed bottleneck. For bottleneck models, performance gets worse as the amount of information stuck at the bottleneck increases. For resource models, performance gets worse as the amount of resources decreases.
17We can make further distinctions within each of these categories. Bottleneck models can be referred to as either ‘‘early selection’’ or ‘‘late selection,’’ depending on where the bottleneck is placed in the information processing sequence (closer to perception or closer to the response). Resource models can be distinguished by the number of resource pools used to perform a task: Is
18Bottleneck models
19Early Late selection selection
20Resource models
21Single Multiple resource resources
22Models of attention
23 FIGURE 9.1 Hierarchical classification of attention models.
24
25 there only a single resource or are multiple resources involved? Both the bottleneck and resource models share the view that all human information processing is limited in capacity.
26One last class of models attempts to explain human performance without hypothesizing any capacity limitations. These models, called executive control models, view decrements in perform- ance as a consequence of the need to coordinate and control various aspects of human information processing. We describe the characteristics of each of these models in turn, along with the experimental evidence that supports each one.
27BOTTLENECK MODELS
28Filter theory. After the resurgence of interest in the topic of attention, the first detailed model of attention, called filter theory, was proposed by Broadbent (1958). The filter theory is an early- selection model in which stimuli enter a central processing channel one at a time to be identified. Extraneous or unwanted messages are filtered out early, before this identification stage. The filter can be adjusted on the basis of relatively gross physical characteristics, such as spatial location or vocal pitch, to allow information from only one source of input to enter the identification stage (see Figure 9.2).
29Broadbent (1958) proposed this particular model because it was consistent with what was known about attention at that time. Many attentional studies were conducted in the 1950s, primarily with auditory stimuli. Probably the best known of these studies is Cherry’s (1953) investigation of the ‘‘cocktail party’’ phenomenon, so called because many different conversations occur simultan- eously at a cocktail party. Cherry presented listeners with several simultaneous auditory messages. The listeners’ task was to repeat word for word (‘‘shadow’’) one of the messages while ignoring the others, much like the situation you would find yourself in at a cocktail party (although you probably would not want to repeat the conversation word for word). Listeners were able to do this as long as the messages were physically distinct in some way. For example, when messages were presented through headphones to the left and right ears, listeners could shadow the message in the left ear while ignoring the right, or vice versa.
30Not only were listeners able to selectively attend to one of the messages but also they showed little awareness of the unattended message other than its gross physical characteristics (e.g., whether
31Input
32Input
33Perceptual channels
34Perceptual channels
35Attentional filter
36Filter theory
37Attentional attenuation
38Filter–attenuation theory
39Response
40Response
41 FIGURE 9.2 Filter and attenuation theories of attention.
42Central processing channel
43 Central processing channel
44
45 the message was spoken by a male or female). In one study (Moray, 1959), listeners had no memory of words that had been repeated even up to 35 times in the unattended message. In another study (Treisman, 1964a), less than a third of listeners seemed aware that an unattended message spoken in Czech was not in English. Consistent with the filter theory, these selective-attention experiments suggested that the unattended message is filtered before the stage at which it is identified.
46Another critical experiment was performed by Broadbent (1958) using what is called a split- span technique. He presented listeners with pairs of words simultaneously, one to each ear, at a rapid rate. The listeners’ task was to report back as many words as they could in any order. The listeners tended to report the words by ear; that is, all the words presented to the left ear were reported in order of occurrence, followed by any that could be remembered from the right ear, or vice versa. Because the messages presented to both ears required attention, this finding suggests that the message from one ear was blocked from identification until the items from the other ear had been identified, again consistent with the filter theory.
47The filter theory nicely captures the most basic phenomena of attention: It is difficult to attend to more than one message at a time, and it is hard to remember anything about an unattended message. Consequently, filter theory remains one of the most useful theories of attention to this day for the design and evaluation of human–machine systems. For example, Moray (1993, p. 111) recommended, for design purposes, that we use ‘‘Broadbent’s original Filter Theory, which is probably both necessary and sufficient to guide the efforts of the designer.’’
48However, most researchers have concluded that filter theory is not entirely correct. As it usually happens when a theory of human behavior is sufficiently specific to be falsifiable, evidence accumulated that is inconsistent with the filter theory. For example, Moray (1959) found that 33% of listeners were aware that their own name had occurred in the unattended message, a finding replicated under more stringent conditions by Wood and Cowan (1995b). Also, Treisman (1960) showed that, when prose passages were switched between the two ears, listeners continued to shadow the same passage after it had been switched to the ‘‘wrong ear.’’ So, the context provided by the earlier part of the message was sufficient to direct the listener’s attention to the wrong ear. These and other studies indicate that the content of unattended messages is identified at least in some circumstances, a fact that the filter theory cannot easily explain.
49Attenuation and late-selection theories. Treisman (1964b) attempted to reconcile the filter theory with these conflicting findings. She proposed a filter–attenuation model in which an early filter served only to attenuate the signal of an unattended message rather than to block it entirely (see Figure 9.2). This could explain why the filter sometimes seemed to ‘‘leak,’’ as in the two examples we described in the previous paragraph. That is, an attenuated message would not be identified under normal conditions, but the message could be identified if familiarity (e.g., for your name) or context (e.g., for the prior words in a prose passage) sufficiently lowered the identification threshold of the message. Although the filter–attenuation model is more consistent with the experimental findings than the original filter theory, it is not as easily testable.
50An alternative approach to address the problems of the filter theory was to move the filter to later in the processing sequence, after identification had occurred. Deutsch and Deutsch (1963) and Norman (1968) argued that all messages are identified but decay rapidly if not selected or attended. There is some evidence that supports this late-selection model. For example, Lewis (1970) presented listeners with a list of five words in one ear at a rapid rate. The listeners shadowed the list, while an unattended word was presented in their other ear. Listeners were not able to recall the unattended word, but its meaning affected their response times to pronounce the simultaneously presented, attended word. Response time was slowed when the unattended word was a synonym of the one being shadowed, which is evidence that the meaning of the unattended word interfered with the pronunciation of the attended word.
51There has been some debate over what these findings mean. Treisman et al. (1974) argued that Lewis’s findings only occur for short lists or early positions in longer lists, before the filter has been adjusted to exclude the unattended message. One possible resolution to the debate may be that
52
53 whether selection is early or late varies as a function of the specific task demands. Johnston and Heinz (1978) suggested that as the information processing system shifts from an early-selection to a late-selection mode, more information is gathered from irrelevant sources, requiring a greater amount of effort to focus on a relevant source.
54More recently, Lavie (2005) has marshaled considerable evidence for a hybrid early and late- selection theory of attention, called the ‘‘load theory.’’ In load theory, whether selection is early or late will depend on whether the perceptual load is low or high. One situation has higher perceptual load than another if there are more stimuli that need to be processed or if the perceptual discrim- ination that must be performed is more difficult. When perceptual load is high, selection is shifted to early in the process and irrelevant stimuli are not identified. When perceptual load is low, selection can be delayed until later in the processing sequence. In this situation, both irrelevant and relevant stimuli are identified. Some evidence that high perceptual load leads to early selection is the finding that a participant who is looking at briefly presented displays is less likely to be aware that an unexpected stimulus occurred in a peripheral display location when the perceptual load is high than when it is low (Cartwright-Finch & Lavie, 2007).
55RESOURCE MODELS
56The difficulty of pinpointing the locus of a single bottleneck led some researchers to take a different approach and develop resource models of attention. Instead of focusing on a specific location in the information-processing sequence where attentional limitations arise, resource models postulate that attentional limitations arise because a limited capacity of resources are available for mental activity. Performance suffers when resource demand exceeds the supply.
57Unitary-resource models. Unitary-resource models were proposed by several authors in the early 1970s. The best known is Kahneman’s (1973), which is illustrated in Figure 9.3. According to his model, attention is a limited-capacity resource that can be applied to a variety of processes and
58Miscellaneous determinants
59Arousal
60Available capacity
61Allocation policy
62Possible activities
63Responses
64Miscellaneous manifestations of arousal
65 Enduring dispositions
66Momentary intentions
67Evaluation of demands on capacity
68 FIGURE 9.3 Unitary-resource model of attention.
69
70 tasks. Executing several tasks simultaneously is not difficult unless the available capacity of attentional resources is exceeded. When capacity is exceeded performance will suffer, and the information processing system will need to devise a strategy for allocating the resources to different possible activities. This allocation strategy will depend on momentary intentions and evaluations of the demands being placed on these resources.
71The unitary-resource model suggests that different tasks have different attentional requirements. Inspired by this idea, researchers began designing experiments that would allow them to measure these attentional requirements. Posner and Boies (1971) used a dual-task procedure in which a person is required to perform two tasks at once. They classified one task as primary and the other as secondary, and the person understood that the primary task was supposed to be performed as well as possible. Under the hypothesis that attention is a single pool of processing resources, any available resources should be devoted to the primary task first. Any spare resources can then be devoted to the secondary task. If the attentional resources are depleted by the primary task, performance of the secondary task will suffer. Posner and Boies’s procedure is sometimes called the probe technique, because the secondary task is usually a brief tone or visual stimulus that can be presented at any time during an extended primary task. Thus, the secondary stimulus ‘‘probes’’ the momentary attentional demands of the primary task. By looking at the responses to probes throughout the primary-task sequence, we can obtain a profile of the attentional requirements of the primary task.
72For their primary task, Posner and Boies (1971) displayed a letter, followed by another letter 1 s later. Observers were to judge whether the pair was the same or different. The secondary task required observers to indicate when probe tones were presented by pressing a button. Reaction times to the probe were slowed only when the tone occurred late in the primary-task sequence (see Figure 9.4), leading Posner and Boies to conclude that it was the late processes of comparison and response selection that required attention. However, later studies showed small effects for tones presented early, suggesting that even the process of encoding the initial letter apparently requires a small amount of attentional resources (Johnson et al., 1983; Paap & Ogden, 1981).
73These studies illustrate that dual-task procedures can provide sensitive measures of the moment- ary attentional demands on a person. Such procedures can be used to determine the difficulty of different tasks and task components, and therefore to predict when operator performance will suffer. As described later, dual-task procedures have a long history of application in human factors ‘‘with
74600
75500
76400
77300
78 First Second Warning letter letter
79III
80 FIGURE 9.4 Probe reaction time (RT) at various times during a letter-matching task.
81–1 0 0.5 1 1.5 2 Time (s)
82RT (ms)
83
84 the intention of finding out how much additional work the operator can undertake while still performing the primary task to meet system criteria’’ (Knowles, 1963, p. 156).
85The ideas that we have been discussing are based on the assumption that the amount of attentional resources that could be applied to a task is the same regardless of when or how that task is performed. However, Kahneman (1973) suggested that available capacity may fluctuate with the level of arousal and the demands of the task. If a task is easy, the available attentional resources may be reduced. Young and Stanton (2002) made this aspect of resource theory, which they call malleable attentional resources, the basis for explaining why performance often suffers in situations of mental underload, or when the task is too easy. They used a dual-task procedure, where the primary task was simulated driving. The secondary task required the drivers to determine if pairs of rotated geometrical figures in a corner of the driving display were the same or different. The driving task was performed under four levels of automation, in which some number of the subtasks were performed by the simulation (e.g., controlling velocity and steering). The number of correct same– different judgments increased with increasing automation, indicating reduced attentional demands of the primary driving task.
86In addition to the accuracy of responses to the probe, Young and Stanton also measured how long-distance drivers looked at the geometric figures. As automation increased, the ratio of the number of correct secondary-task responses to the amount of time spent gazing at the probe figures decreased. This finding indicates that drivers had to look longer at the figures to make their responses under conditions of low workload, which suggests that processing of those figures was less efficient. Young and Stanton suggested that available attentional capacity is reduced when the attentional demands of the primary driving task decrease. If this is true, then a potential hazard of automation (like driving with cruise control) is to reduce the alertness and attentional capacity of the driver, thus inadvertently reducing his or her performance.
87Multiple-resource models. An alternative view that has been prominent, particularly in human factors, is multiple-resource theory (Navon & Gopher, 1979). Multiple-resource models propose that there is no single pool of attentional resource. Rather, several distinct cognitive subsystems each have their own limited pool of resources. Wickens (2002a) presented a four-dimensional (4D) system of resources consisting of distinct stages of processing (encoding, central processing, and responding), information codes (verbal and spatial), and input (visual and auditory) and output (manual and vocal) modalities (see Figure 9.5). He also proposed distinct resources for visual channels (focal and ambient), based on the distinction described in Chapter 5. The model assumes that to the extent that two tasks require separate pools of resources, the more efficiently they can be performed together. Changes in the difficulty of one task should not influence performance of the other if the tasks draw on different resources.
88Multiple-resource models were developed because the performance decrements observed for the performance of multiple tasks often depend on the stimulus modalities and the responses required for each task. For example, Wickens (1976) had observers perform a manual tracking task in which a moving cursor was to be kept aligned with a stationary target. At the same time, another task was performed involving either the maintenance of a constant pressure on a stick or the detection of auditory tones, to which vocal responses were made. Although the auditory-detection task was judged to be more difficult, observers performed better on the tracking task with it than with the constant-pressure task. This is presumably because both the tracking task and the constant-pressure task require resources from the same output (manual) modality pool. The general principle captured by multiple-resource models is that performance of multiple tasks will be better if the task dimensions (stages of processing, codes, and modalities) do not overlap. We encounter other situations later (see Chapter 10) that reinforce this idea.
89Wickens’s specific characterization of the multiple-resource model has been extremely influen- tial in human factors because of its ‘‘ability to predict operationally meaningful differences in performance in a multi-task setting’’ (Wickens, 2002b, p. 159). That is, it can predict how much two tasks will interfere with each other by looking at whether the tasks rely on the same or different
90
91 Manual
92 Visual
93Auditory Spatial
94Verbal
95FIGURE 9.5 Multiple-resource model of attention.
96resources. Even though the multiple-resource view is a useful way to evaluate the design of different task environments, multiple-resource models in general have not been widely accepted as a general theory of how attention works. This is because patterns of dual-task interference are much more complicated than we would expect from the simple concept of multiple resources (Navon & Miller, 1987).
97EXECUTIVE CONTROL MODELS
98The bottleneck and resource models we just described explain decrements in multiple-task perform- ance as consequences of a limited capacity for processing information. These models do not place much emphasis on executive control processes, which supervise how limited capacity is allocated to different tasks. However, voluntary control of capacity allocation, and how this control is exerted to accomplish specific task goals, is an important factor in human performance (Monsell & Driver, 2000).
99One of the most prominent efforts to analyze performance in terms of executive control processes has been that of Meyer and Kieras (1997a,b). They developed a framework for cognitive task analysis that they call executive-process interactive control (EPIC) theory, which was intro- duced in Box 4.1. The EPIC theory says that decrements in multiple-task performance are due to the strategies that people adopt to perform different tasks in different manners. One way this theory differs from the other models of attention we have discussed is that it assumes there is no limitation in the capacity of central, cognitive processes. EPIC realizes that, at a fundamental level, people’s abilities to process information at peripheral perceptual-motor levels are limited (e.g., detailed visual information is limited to foveal vision, and an arm cannot move to the left and right at the same time). At higher cognitive levels, EPIC accounts for decrements in multiple-task performance not through a limited-capacity bottleneck or resource but through flexible scheduling strategies used to accommodate task priorities and the peripheral sensory limitations. Executive cognitive processes control these strategies to coordinate the performance of concurrent tasks.
100As an example, Meyer and Kieras (1997a,b) examined a situation in which two tasks are to be performed (see Chapter 13), call them task 1 and task 2. The stimuli for the two tasks are presented in rapid succession, and instructions stress that the response for task 1 should be made before that
101 Stages
102Responses
103Responding
104Central Encoding processing
105Vocal
106Spatial Verbal
107Modalities
108Codes
109
110 for task 2. According to EPIC, responses are slower than normal for task 2 because its response is strategically deferred. That is, the processing required for task 2 is delayed until processing of task 1 has progressed past the point where it could be affected by whatever needs to be done for task 2. This deferment strategy ensures that the response for task 2 will not create any conflict with that for task 1 and that it will not precede the task 1 response, as instructed.
111EPIC computational models have been applied successfully to multiple-task performance not only in the laboratory but also in real-world circumstances including HCI and military aircraft operation. They are a particularly attractive alternative for cognitive task analyses because EPIC incorporates much of what we currently understand about the fixed mechanisms of human infor- mation processing and how they interact with task-specific strategies to determine how cognitive operations unfold over time (Kieras & Meyer, 2000).
112SUMMARY
113The early-selection filter theory explains the important fact that people have little awareness of or memory for stimulus events to which they are not attending. The early-selection attenuation model explains the fact that unattended stimuli of particularly high salience due to context or past experience may nevertheless enter awareness. The late-selection bottleneck model explains why major decrements in performance are often associated with processes that occur after perception and stimulus identification. The unitary-resource model accurately depicts how people can control how attention is divided across tasks, and the multiple-resources model explains why multiple-task performance is often worse when two tasks share the same sensory and motor modalities or processing codes than when they do not. Finally, the executive control processes theory emphasizes that, regardless of whether central-processing limitations exist, an important part of multiple-task performance is strategic coordination of the tasks.
114MODES OF ATTENTION
115SELECTIVE ATTENTION
116Selective attention is a component of many tasks. For example, when reading an instruction manual, an operator needs to attend selectively to the written information in the manual and ignore the irrelevant auditory and visual information in his environment. Questions about selective attention usually concern those characteristics of an environmental stimulus on which attention is focused and what characteristics of unattended stimuli disrupt the focus of attention. For example, when our operator is reading his instruction manual, what are the specific properties of the text that allow his attention to remain fixed on it? Furthermore, what kinds of environmental events could disrupt his reading? Many experiments have been performed to try to determine what attention is held by and how it is drawn away. Most of these experiments have used either auditory or visual tasks.
117Auditory tasks. One task used to study selective attention is selective listening, in which a to-be- attended (target) auditory message is presented together with another (distractor) auditory signal. The distractor can interfere with the target message by masking it or by confusing the listener about which signal is the target.
118We discussed already that selective listening is relatively easy when the target message is physically distinct from the distractor. Spatial separation of the target and distractor, induced either by presenting the signals from different loudspeakers or to separate ears through headphones, makes it easier to attend to the target message (Spieth et al., 1954; Treisman, 1964a). Similarly, selective listening is easier when the target and distractor are of different intensities (Egan et al., 1954) or from different frequency regions within the auditory spectrum (Egan et al., 1954; Woods et al., 2001). These findings are consistent with filter theory’s emphasis on early selection of information to be attended based on gross physical characteristics.
119
120 However, it is not just the different physical characteristics of the target and distracting signals that influence performance. Meaning and syntax affect selective listening performance when both signals are speech messages. Listeners make fewer errors when the target and distractor are of different languages, when the target message is prose rather than random words, and when the target and distractor are distinctly different types of prose, for example, a novel and a technical report (Treisman, 1964a,b). Moreover, listeners may develop expectancies based on the context of each message, which can lead to misperception of words to make them consistent with the context (Marslen-Wilson, 1975).
121We have already discussed how, when performing a selective-listening task in which the distractor message is distinguished physically from the target message, say by spatial location, listeners cannot remember much about the distractor (Cherry, 1953; Cherry & Taylor, 1954). They can identify changes in basic acoustic features, such as that the voice switched from male to female in the middle of the message, but not particular words or phrases that occurred in the distractor. Cherry reported that only a third of his listeners noticed when the unattended message was switched to backward speech in the middle of the shadowing period. Wood and Cowan (1995a) confirmed this finding and found evidence that those who noticed the backward speech apparently did so because they diverted attentional resources to the distractor, as evidenced by disruption of shadow- ing several seconds after the distractor switched to backward speech.
122Although listeners can only remember a little about the items in an auditory distractor message, the listener’s ability to later recognize distracting information is better if the distracting information was presented visually instead of auditorily. Furthermore, recognition is better when the distractors are pictures or visually presented musical scores than when they are visually presented words, indicating that retention of the distractor information decreases as the content of the distractor message becomes more similar to the content of the target message (Allport et al., 1972).
123To this point, we have discussed the factors that facilitate or inhibit a listener’s ability to selectively attend to particular objects (messages). However, attention can be focused on particular features of a message. Scharf et al. (1987) showed that people can focus attention on a narrow band of the auditory spectrum. They required listeners to decide which of two time intervals contained a tone which could vary in frequency. Events that occurred earlier in the experiment caused listeners to expect tones of a certain frequency. When the presented tone was near the expected frequency it was detected well, but if it was significantly different from the expected frequency it was not detected at all. Thus, under at least some conditions, focused attention alters sensitivity to specific auditory frequency bands.
124Visual tasks. Selective attention for visual stimuli has been studied by presenting several visual signals at once and requiring an observer to perform a task that depends only on one of them. Similar to the messages presented in a shadowing task, the visual signal to be attended is called the target and all others are called distractors. As with auditory selective attention, the observer may show little awareness of events to which he or she is not attending (see Box 9.1).
125Many experiments on visual selective attention use letters as signals and require the observer to identify the letter that appears in a particular location. If distractors are at least 18 of visual angle away from a target (presented at a known location), they will produce little or no interference with the ability of the observer to identify the target (Eriksen & Eriksen, 1974). If the distractors are very close to the target they will be identified along with the target, which can result in a decrement in task performance.
126More generally, the required response to a target can be made more quickly when the distractors would require the same response as the target, but the response to a target is slowed when the distractors require a different response (e.g., Rouder & King, 2003). For example, we might show an observer a letter triple, like ‘‘X A X,’’ and ask him to identify the letter in the middle. If the letter in the middle is an A or a B, the observer is to press one key, but if it is an X or a Y he or she is to press another key. The observer will have an easier time with displays like ‘‘B A B’’ or ‘‘Y X Y’’ than he or she will with displays like ‘‘X A X’’ or ‘‘B Y B.’’ However, if the distance between the outer
127
128 BOX 9.1
129Change Blindness
130Change blindness is a remarkable phenomenon that has attracted the interest not only of people who study attention but also the popular media. Change blindness refers to a person’s inability to detect gross or striking changes in a visual scene. A popular demonstration of change blindness asks observers to count the number of passes of a ball between players of a basketball game. While the game is in play, a person in a gorilla suit strolls through the players. Although the gorilla is directly in sight, and obviously and hilariously out of place, relatively few observers even notice that a gorilla has joined the game (see Figure B9.1; Durlach, 2004; Simons & Ambinder, 2005; Simons & Chabris, 1999).
131The most widely used procedure to study change blindness uses pictures (Rensink et al., 2000). Two pictures that differ only in a single conspicuous element are shown one after the other, over and over again, with a blank screen intervening between each picture for a period of about 1=10 s. Changes between displays in the color of an object, its position, or even whether it is present in one version and absent in the other, are difficult to detect. Some of these pictures might include a jet, whose engine appears and disappears, a government building, with a flagpole that moves from one side of the picture to the other, and a city street scene, with a cab that changes from yellow to green. It may take many presentations of the pictures before the observer is able to identify what is changed in one display compared to the other.
132Researchers who study change blindness are interested in why people are unaware of significant changes in displays and what conditions lead to this lack of awareness. We know, for example, that a change can be detected easily when the blank screen between the two displays is omitted. This may occur because the difference between the two pictures generates a ‘‘transient cue,’’ a visual signal (such as apparent motion or an abrupt onset) that directs the observers’ attention to the exact location of the change.
133There are many real-world tasks that may be affected by change blindness. For example, operators of complex computer-based systems such as those in crisis response centers and air traffic control centers must monitor multiple, multifaceted displays and perform appropriate control actions when needed (DiVita et al., 2004). Failure of the operator to detect display changes signaling crucial events can have serious consequences.
134Change blindness occurs in a variety of situations in which the observer’s attention is distracted or there is a brief break in the visibility of the information. O’Regan et al. (1999) showed that presenting a ‘‘mudsplash’’ (a series of superimposed dots) on the screen at the time the change was made produced change blindness, even when the mudsplash was not in
135FIGURE B9.1 Three frames from the movie, Gorillas in Our Midst. (From Simons & Chabris 1999; Figure provided by Daniel Simons.)
136 (continued )
137
138 BOX 9.1 (continued ) Change Blindness
139the area of the change. Change blindness also occurs if the change is timed to coincide with a blink (O’Regan et al., 2000) or a saccadic eye movement (Grimes, 1996), which are essen- tially observer-induced blank periods. Levin and Simons (1997) demonstrated that the majority of people did not detect relatively salient changes between ‘‘cuts’’ in scenes from a video, including a change in the person who was the focus of the video.
140One of the most striking demonstrations of change blindness can be shown for a real- world event. Simons and Levin (1998) had an experimenter stop a person on the street and ask for directions. As the person was providing directions, two other people carrying a solid door walked between the person and the experimenter. In a way that was carefully choreographed, the experimenter grabbed one end of the door and walked away while the person who had been carrying the door stayed behind. Only about 50% of the people providing directions noticed that they were now talking to a different person!
141 distracting letters and the central target increases, like ‘‘X A X,’’ the observer will not experience as much difficulty. Interference between the response to be given to the target and the response to be given to the distractors diminishes as the distance between the target and the distractors increases.
142Results like this suggest that the focus of attention is a spotlight of varying width that can be directed to different locations in the visual field (Eriksen & St. James, 1986; Treisman et al., 1977). Interference among visual stimuli occurs because the spotlight cannot always be made small enough to prevent distracting stimuli from being attended. In the case of stimuli like ‘‘X A X,’’ the ‘‘X’’ distractors, which require a different response from the target ‘‘A,’’ are included in the spotlight of attention and identified. The response to the target is inhibited by the competing response required for the distractors. If the Xs are separated from the A by a sufficient amount, they no longer fall within the spotlight and are not identified. Consequently, the response to X is never ‘‘activated’’ and cannot interfere with the response to A.
143These studies suggest that the focus of attention has a lower limit: it can get smaller, but not too small. Another study showed that the focus of attention can be made larger. LaBerge (1983) had people perform different tasks with five-letter words. One task required the observer to deter- mine whether the word was a proper noun, whereas the other task required the observer to determine whether the middle letter was from the set A,B,...G or N,O,...U. The word task required a larger focus of attention, at the level of the whole word, whereas the letter task required that the observer focus attention on the middle letter. During both tasks, on some ‘‘probe’’ trials no word was presented. Instead, a single letter or digit was presented in one of the five positions corresponding to where the letters of the word were presented, with # signs in the others. For example, rather than ‘‘HOUSE,’’ the observer might instead see ‘‘#Z###’’ or ‘‘##7##.’’ For this stimulus the observer was asked to quickly identify whether it contained a letter or a digit. If the observer was performing the word task (and his or her attention was focused on the whole word), where the letter or digit appeared on a probe trial did not influence how quickly it could be identified. However, if the observer was performing the letter task (and his or her attention was focused only on the middle letter), responses on probe trials were fastest when the letter or digit appeared in the middle position and became progressively slower as it moved farther away.
144One way observers can selectively attend to different visual stimuli is by moving their eyes to different places in the visual field (e.g., Nummenmaa et al., 2006). Fixated objects will be seen clearly, whereas those in the visual periphery will not. However, the spotlight metaphor suggests that it should be possible to dissociate the focus of attention from the direction of gaze; that is, an observer should be able to selectively attend to a location in the visual field that is different from his
145
146 or her fixation point. Such a process is referred to as covert orienting, as opposed to the overt orienting that occurs as a function of eye position.
147Posner et al. (1978) showed that observers can use covert orienting to improve their perform- ance in a simple visual task. Their task was to detect the onset of the letter X in a display. The X could appear 0.58 either to the left or right of fixation. Prior to the presentation of the X, a cue was presented at the point of fixation. The cue was either a neutral plus sign or an arrow pointing to the left or right. The cue was intended to give the observer information about where the X was likely to appear. The X occurred on the side indicated by the arrow on 80% of the trials and on the opposite side on the remaining 20%. Reaction time to locate the X was fastest when the X appeared at the cued location and slowest when it occurred at the uncued location (see Figure 9.6). Observers apparently used the cue to shift the focus of attention from the fixation point to the most likely location of the X.
148Experiments conducted after that of Posner et al. (1978) have tried to determine whether attention is moved gradually from the point of fixation to the cued position (like the movement of a spotlight over a surface), or whether the shift of attention occurs discretely. Some experiments have shown that the amount of time it takes to move from the fixation point to the position of the target stimulus is the same regardless of how far away the target position is. Therefore we must conclude that attention ‘‘jumps’’ in a discrete way from one point to another (Yantis, 1988).
149The arrow cues of Posner et al. (1978), which point in one direction or another, are said to induce endogenous orienting of attention, that is, a shift of attention that is initiated voluntarily by the individual. Attention can also be drawn involuntarily to a location or object by the rapid onset or perceived motion of a stimulus (Goldsmith & Yeari, 2003), a type of shift that is called exogenous orienting. In other words, even when an observer does not move his eyes, his attention may shift reflexively and involuntarily to the location where he sees a stimulus appear or move unexpectedly.
150Exogenous orienting of attention can both help and hinder performance of a task. Consider a modification of the Posner et al. task where, rather than using arrow cues, the likely position of the target is cued exogenously by flashing a neutral stimulus at the target location. The abrupt onset
151 320
152 300
153 280
154 260
155 240
156 220
157 200
158FIGURE 9.6 Reaction times as a function of target position certainty. (Data from Posner, Nissen, & Ogden, 1978.)
159 Invalid (20%)
160Target position certainty
161Neutral (50%)
162Valid (80%)
163Reaction time (ms)
164
165 (flash) of the neutral stimulus will draw attention to its location. If the time between this exogenous cue and the target is short (less than about 300 ms), responses to a target presented in that location will be made more quickly. However, if the time between the cue and the target is longer than 300 ms, responses to targets presented in uncued locations will actually be faster than those to targets presented in cued locations (e.g., Los, 2004).
166This phenomenon is called inhibition of return. We may hypothesize that with longer cue-target delays, attention may shift (either voluntarily or involuntarily) to other locations in the visual field. Once attention has shifted away from an exogenously cued location there is a tendency to avoid returning it to that same location. Although the purpose of this attentional mechanism is not yet well understood, it may make visual search of complex environments more efficient by prevent- ing a person’s attention from revisiting locations that have already been checked (Snyder & Kingstone, 2007).
167Switching and controlling attention. The distinction between endogenous and exogenous shifts of attention brings up questions about how attention is controlled. Many tasks, such as driving, require rapid switching of attention between sources of information. It turns out that people differ in their abilities to switch attention from one source of information to another. Kahneman and his colleagues (Gopher & Kahneman, 1971; Kahneman et al., 1973) evaluated the attention-switching ability of fighter-pilot candidates and bus drivers using a dichotic listening task. The subjects were asked to shadow one of two messages presented to each ear. After selectively attending to information in one ear, a tone sounded in the other ear, indicating that the subject should shift his attention and shadow the message in the other ear. The number of errors made after the attention- switching signal was negatively correlated with the success of cadets in the Israeli Air Force flight school; successful cadets made fewer errors than unsuccessful cadets. The number of errors made was positively correlated with the accident rates of the bus drivers; drivers who made more errors had more accidents than drivers who made fewer errors. Similar results have been found for Royal Netherlands Navy air traffic control applicants (Boer et al., 1997).
168While we have concentrated our discussion on simple laboratory tasks that bear little resem- blance to the complicated environments that people encounter in real-world settings, these studies indicate that attention shifting is a skill that affects performance of complex navigation tasks outside the laboratory. Furthermore, Parasuraman and Nestor (1991) note that attention-shifting skill deteriorates for older drivers with various types of age-related dementia. Because of these individual differences in attention-shifting ability, it may be appropriate to assess driving competence in part by evaluating attention.
169DIVIDED ATTENTION
170Whereas a selective-attention task requires a person to attend to only one of several possible sources of information, divided-attention tasks require a person to attend to several sources of information simultaneously. In many situations, people perform best when they must monitor only a single source of information, and they perform more and more poorly as the number of sources increases. This decrement in performance is usually measured as a decrease in accuracy of perception, slower response times, or higher thresholds for detection and identification of stimuli.
171There are many applied settings in which operators must perform divided-attention tasks by monitoring several sources of input, each potentially carrying a target signal. Consider an environ- ment in which an operator must monitor a large number of gauges, each providing information about some aspect of a complex system’s performance. An operator may be required to detect one or more system malfunctions which would appear as one or more gauges registering abnormal readings. Such environments are common in nuclear power plant control rooms, process control system interfaces, and aircraft cockpits.
172How well an operator can monitor several sources of information depends on the task he or she is to perform. Suppose, for example, that when one or more of the gauges in an array registers an
173
174 abnormal system condition, the operator’s job is to shut down the system and inform his or her supervisor. In this case, the operator’s ability to detect a target is only slightly degraded relative to when he or she must monitor only a single gauge (Duncan, 1980; Ostry et al., 1976; Pohlman & Sorkin, 1976). This is because if more than one target occurs, the probability that the operator will detect at least one of them increases, although the probability that the operator will detect any particular target decreases. The likelihood of detecting a target from a single source diminishes further as the number of simultaneous targets from other sources increases.
175Problems will arise when two or more targets must be identified separately. For example, the operator may need to shut a water intake valve in response to one abnormal reading, but open a steam-pressure valve in response to another. In this scenario, the operator’s ability to detect, identify, and respond to any particular target will be worse than when he or she is attending to only a single source of information.
176Although some of the operator’s difficulties in responding to multiple, simultaneous targets can be reduced with practice (Ostry et al., 1976), his or her performance will never be as good as when he or she is attending to only a single input source. In applied situations, if simultaneous targets are very likely to occur, and a failure to detect and respond to those targets may lead to system failure, then each source should be monitored by a separate operator.
177For situations in which an operator must divide his or her attention between different tasks or sources of information, he or she may not need to give each task the same priority. For example, with the probe technique, which we described earlier, one of two tasks is designated as primary and the other as secondary. More generally, any combination of relative weightings can be given to the two tasks: for example, an operator might be instructed to pay twice as much attention to the primary task as the secondary task. That is, an operator can ‘‘trade off’’ his performance on one task to improve his performance on another task.
178The trade-off in dual-task performance can be described with a performance-operating characteristic (POC) curve (Norman & Bobrow, 1976), sometimes called an attentional operating characteristic (Alvarez et al., 2005), which is similar in certain respects to the ROC curve presented in Chapter 4. Figure 9.7 shows a hypothetical POC curve. For two tasks, A and B, the abscissa represents performance on task B, and the ordinate represents performance on task A. In the POC, performance can be measured in any number of ways (speed, accuracy, etc.), as long as good performance is represented by high numbers on each axis. Baseline performance for each task when performed by itself is shown as a point on each axis. If the two tasks could be performed together as efficiently as when performed alone, performance would fall on the independence
179High
180 Single task B
181Cost of concurrence
182P Efficiency
183Single task
184 Dual task B
185Bias
186Dual task
187AA
188Low
189Low Task A performance High
190FIGURE 9.7 Performance-operating characteristic (POC) curve.
191Task B performance
192
193 point P. This point shows performance when no attentional limitations arise from doing the two tasks together.
194The box formed by drawing lines from point P to the axes defines the POC space. It represents all possible combinations of joint performance that could occur when the tasks are done simultan- eously. The actual performance of the two tasks will fall along a curve within the space. Perform- ance efficiency, the distance between the POC curve and the independence point, is an indicator of how efficiently the two tasks can be performed together. The closer the POC curve comes to P, the more efficient is performance. Like an ROC curve, the different points along the POC curve reflect only differences in bias induced by changing task priorities. The point on the positive diagonal reflects unbiased performance (equal attention given to both tasks), whereas the points toward the ordinate or abscissa represent biases toward task B or task A, respectively. Finally, the cost of concurrence is shown by the difference between performance for one task alone and for dual-task performance in which all resources are devoted to that task.
195A POC curve is obtained by testing people in single- and dual-task conditions and varying the relative emphases placed on the two tasks. Performance on a task in a dual-task scenario might approximate that when it is performed alone or be substantially worse, depending on the conditions imposed in the dual-task scenario. POC analyses can be used to evaluate operator performance and task design in many complex systems in which operators must perform two or more tasks concurrently, such as monitoring radar or piloting aircraft.
196To illustrate the use of POC curves, we will describe a study by Ponds et al. (1988) that evaluated dual-task performance for young, middle-aged, and older adults. One task involved simulated driving, whereas the other required counting a number of dots, which were presented at a location on the simulated windshield that did not occlude the visual information necessary for driving. Performance was normalized for each age group, so that the mean single-task performance for each group was given a score of 100%. This normalization makes it possible to evaluate age differences in dual-task performance independently from any overall differences that might be present across the groups.
197POC curves were obtained for each age group by plotting the normalized performance scores obtained for dual-task performance under three different emphases on driving versus counting (see Figure 9.8). For the normalized curves, the independence point is (100, 100). The older adults show a deficit in divided attention, as evidenced by the POC curve for the older adults being further from the independence point than the POC curves for the middle-aged and young adults. This divided- attention deficit for the elderly corroborates Parasuraman and Nestor’s (1991) work on attention shifting in elderly drivers and does not go away with practice (McDowd, 1986).
198AROUSAL AND VIGILANCE
199A person’s attentional ability is influenced by his or her level of arousal. Arousal level may influence the amount of attentional resources available to perform a task, as well as the policy by which attention is allocated to different tasks. This relation between attention and arousal underlies a widely-cited law of performance, the Yerkes–Dodson law (Yerkes & Dodson, 1908). According to this law, performance is an inverted U-shaped function of arousal level, with best performance occurring at a higher arousal for simple tasks than for complex tasks (see Figure 9.9).
200It is not surprising that performance is poor at low arousal levels. Extremely low arousal may result in a person being unprepared to perform the task or failing to monitor performance and, as a result, failing to pay attention to changing task demands. Because the number of features to consider in difficult tasks typically is greater and the coordination of attention more crucial than in easy tasks, difficult tasks show a greater performance decrement at lower arousal levels than do simple tasks.
201It is more surprising that performance tends to deteriorate at high arousal levels. Several factors contribute to this deterioration, but it is primarily due to a decrease in attentional control. At high arousal levels, a person’s attention becomes more focused (either appropriately or inappropriately),
202
203 Young adults Middle-aged adults Older adults
204 0 10 20 30 40 50 60 70 80 90 100 Dot-counting task
205FIGURE 9.8 Normalized (performance-operating characteristic) POC curves for older, middle-aged, and young adults in a divided-attention task.
206and the range of cues she uses to guide her attention becomes more restricted (Easterbrook, 1959). Also, a person’s ability to discriminate between relevant and irrelevant cues decreases. Thus, at high arousal levels, fewer and often less appropriate features of the situation control the allocation of attention. This theory suggests that performance will not decline at high levels of arousal if attention remains directed toward the task at hand (Näätänen, 1973).
207 Simple task
208Complex task
209 Low
210FIGURE 9.9 The Yerkes–Dodson law.
211High
212Low
213High
214Arousal level
215100
21690 80
21770
21860
21950
22040
22130
22220
22310
2240
225Driving task
226Quality of performance
227
228 The value of the Yerkes–Dodson law has been disputed (e.g., Hancock & Ganey, 2003; Hanoch & Vitouch, 2004), based in part on evidence that arousal depends on many different factors and consists of many different physiological responses. Therefore, it is not possible to attribute a benefit or decrement in performance to one general arousal level. However, as noted by Mendl (1999, p. 225), ‘‘The law can be used descriptively as a shorthand way of summarising the observed relationship between a diverse range of apparently threatening or challenging stimuli and various measures of cognitive performance, without the implication that all relationships are mediated by a single stress or arousal mechanism.’’ We devote the remainder of this section to two important effects of arousal on attention: perceptual narrowing and the vigilance decrement.
229Perceptual narrowing refers to the restriction of attention that occurs under high arousal (Kahneman, 1973). Weltman and Egstrom (1966) used a dual task to examine perceptual narrowing in the performance of novice scuba divers. The primary task required the divers to add a centrally presented row of digits or to monitor a dial to detect a larger than normal deflection of the pointer, and the secondary task required them to detect a light presented in the periphery of the visual field. The level of arousal was manipulated by observing the divers in normal surroundings (low stress), in a tank (intermediate stress), and in the ocean (high stress). Performance on either primary task was unaffected by stress level, but as stress increased it took the divers longer to detect the peripheral light.. This finding suggests that the divers’ attentional focus narrowed under increased stress. We see similar effects on people performing simulated driving tasks under high arousal levels induced by engaging in competition (Janelle et al., 1999) or consuming an amphetamine (Silber et al., 2005).
230In contrast to perceptual narrowing, which occurs at high levels of arousal or stress, the vigilance decrement occurs under conditions in which arousal seems, at first, to be very low. Before defining the vigilance decrement, we must define what we mean by vigilance. Many tasks involved in operating automated human–machine systems involve sustained attention, or vigilance. Consider the gauge-checker, discussed above, who must monitor many gauges simultaneously for evidence of system failure. If system failures are very infrequent, leaving the operator with almost nothing to do for very long periods of time, we say that he or she is performing a vigilance task. The defining characteristic of a vigilance task is that it requires detection of relatively infrequent signals that occur at unpredictable times.
231Research on vigilance began in World War II, spurred by the problem that radar operators were failing to detect a significant number of submarine targets. As systems have become more auto- mated, there are many more situations in which an operator’s role is primarily one of passively monitoring displays for critical signals, so vigilance research is still important. Vigilance in part determines the reliability of human performance in such operations as airport security screening, industrial quality control, air traffic control, jet and space flight, and the operation of agricultural machinery (Warm, 1984).
232The vigilance decrement was first demonstrated in an experiment by Mackworth (1950). He devised an apparatus in which observers had to monitor movements of a pointer along the circumference of a blank-faced clock. Every second, the pointer would move 0.3 in. to a new position. Occasionally, it would take a jump of 0.6 in. Observers were required to execute a keypress response when a ‘‘target’’ movement of 0.6 in. was detected. The monitoring session lasted 2 h. Mackworth found that the hit rate for detecting the target movement decreased over time, a finding that has since been replicated with many tasks. Figure 9.10 shows the vigilance decrement for three tasks over a two-hour period. The maximal decrement in accuracy occurs within the first 30 min. Not only does accuracy decrease, but other studies show that reaction times for hits (as well as for false alarms) become slower as the time spent at the task increases (Parasuraman & Davies, 1976).
233Why does hit rate decrease? The decrease could reflect either a decrease in sensitivity to the signals or a shift to a more conservative criterion (requiring more evidence) for responding. To determine which of these is responsible for the decreased hit rate, we can perform a signal detection analysis (see Chapter 4). When such analyses are performed on the vigilance decrement, for some situations there is an increase in the criterion (b), with sensitivity (d0) remaining relatively constant,
234
235 100
236 90
237 80
238 70
239 60
240 50
241 40
242 30
243 20
244 10
245 0
24630 60 90120
24730 60 90120 Task duration (min)
24830 60 90120
249 Rotating pointer task Listening task Synthetic radar task
250 FIGURE 9.10 The vigilance decrement for three tasks.
251for detection performance early versus late in the task (Broadbent & Gregory, 1965; Murrell, 1975). The more frequently signals occur, the smaller the change in criterion. This suggests that the criterion can be maintained at a more optimal, lower level by using artificial signals to increase the frequency of events.
252Initial applications of signal detection theory suggested that a sensitivity decrement was rare in vigilance tasks, but subsequent research has shown that it is quite common (See et al., 1995). Parasuraman and Davies (1976) and Parasuraman (1979) proposed that the sensitivity decrement was restricted primarily to tasks that require discrimination based on a standard held in memory, particularly if the event rate is high. An example of such a task would be trying to detect whether a light that comes on periodically is brighter than its usual intensity. The sensitivity decrement for high event-rate tasks requiring comparison to information in memory may be a consequence of the large amount of cognitive resources necessary to maintain a representation of the standard in memory and to compare each event to it, thus reducing the number of resources available for detection (Caggiano & Parasuraman, 2004). For such situations, sensitivity may be improved by having a physical standard present for comparison (a photograph, mockup, or other object that reduces memory load).
253There are many factors that will influence the size of the vigilance decrement in sensitivity (e.g., Parasuraman & Mouloua, 1987). The vigilance decrement is different for discriminations based on sensory information (such as brightness detection) and those based on cognitive information (such as trying to detect a specific digit in a stream of digits; See et al., 1995). If the discrimination requires information from memory, there will usually be a larger vigilance decrement for cognitive discriminations than for sensory discriminations. However, the size of this difference will depend on event rate. At high event rates, there will be little difference in the size of the sensitivity decrement between sensory and cognitive discriminations. If the discrimination does not require information from memory, the vigilance decrement will be larger for sensory discriminations than for cognitive discriminations.
254Correct detections (%)
255
256 Performance in vigilance tasks will also be affected by other characteristics of the signal, as well as by the motivation of the observer. Stronger signals are easier to detect and the vigilance decrement is not as pronounced (Baker, 1963; Wiener, 1964). Auditory signals are easier to detect than visual signals, and the vigilance decrement can be reduced by frequently alternating between auditory and visual modalities (e.g., every 5 min; Galinsky et al., 1990). The vigilance decrement can also be reduced by providing rest periods of 5–10 min or by financial incentives (Davies & Tune, 1969).
257It might seem at first that arousal in vigilance tasks is affected by mental underload, with the vigilance decrement being a consequence of low levels of arousal. However, there is now consider- able evidence suggesting that performance of a vigilance task is quite effortful and that the vigilance decrement reflects a depletion of attentional resources rather than a decrease in arousal. For example, Grier et al. (2003) had subjects perform two types of vigilance tasks, both of which produced a vigilance decrement. However, the subjects’ assessments of mental workload (described in the next section) and stress showed elevated levels. Thus, contrary to what may seem to be the case, requiring someone to sustain attention for the detection of infrequently occurring events is actually very mentally demanding.
258The primary applied message of the research on vigilance is that fairly substantial vigilance decrements can occur in a variety of situations. We can minimize these decrements by carefully selecting the stimulus types, the required discriminations, and the rate at which critical events occur. Also, we must keep in mind that vigilance tasks can be mentally demanding, and so it is important to provide observers with appropriate rest periods and performance incentives. Using appropriate workload-assessment techniques, which we will discuss in the next section, we may be able to modify the design of the vigilance task to reduce the mental demands on the operator.
259MENTAL WORKLOAD ASSESSMENT
260Models of attention have been profitably applied to solve human factors problems. One area in which this application is evident is the measurement of mental workload (Tsang & Vidulich, 2006). Workload refers to the total amount of work that a person or group of persons is to perform over a given period of time. Mental workload is the amount of mental work or effort necessary to perform a task in a given period of time. As task demands increase or the time allowed to perform a task decreases, mental workload increases. Young and Stanton (2006, p. 507) defined mental workload as follows:
261The mental workload of a task represents the level of attentional resources required to meet both objective and subjective performance criteria, which may be mediated by task demands, external support and past experience.
262In work settings, similar to the effect of arousal, performance may suffer if the mental workload is too high or too low. At the upper extreme, it is clear that performance will be poor if there are too many task demands. However, as we noted earlier, an undemanding task may also lead to a deterioration in performance by lowering an operator’s level of alertness. Figure 9.11 illustrates the resulting inverted U-shaped function between mental workload and performance.
263The purpose of mental workload assessment is to maintain the workload at a level that will allow acceptable performance of the operator’s tasks. The workload imposed on an operator varies as a function of several factors. Most important are the tasks that the operator must perform. Workload will increase as required accuracy levels increase, as time demands become stricter, as the number of tasks to be performed increases, and so on. Workload will also be affected by aspects of the environment in which the tasks must be performed. For example, extreme heat or noise will increase the workload. Also, because the cognitive capacities and skills of individuals vary, the workload demands imposed by a given task may be excessive for some people but not for others.
264
265 Region 1
266Region 2
267Region 3
268Acceptable
269Unacceptable
270 Workload
271FIGURE 9.11 The hypothetical relation between workload and performance.
272The mental workload concept comes directly from the unitary-resource model of attention, in which the operator is believed to have a limited capacity for processing information (Kantowitz, 1987). This model lends itself nicely to the concept of spare capacity, or the amount of attentional resources available for use in additional tasks. However, most current workload techniques are more closely linked to the multiple-resources model, for which different task components are assumed to draw on resources from distinct pools of limited capacity. The primary benefit of the multiple- resources view is that it allows the human factors specialist to evaluate the extent to which specific processes are being overloaded.
273There are many workload-assessment techniques that differ in several ways (see Gawron, 2000, for summaries of many of the techniques). A useful taxonomy distinguishes between empirical and analytical techniques (Lysaght et al., 1989; see Figure 9.12). Empirical techniques are those that are used to measure and assess workload directly in an operational system or simulated environment, whereas analytical techniques are those used to predict workload demands early in the system- development process. We will discuss each of these techniques but pay most attention to the empirical techniques. It is an unfortunate consequence of the fact that many designers do not concern themselves with ergonomic issues until their systems are near completion that we have many more empirical than analytic techniques for assessing workload.
274EMPIRICAL TECHNIQUES
275There are four major empirical techniques. The first two involved are focused on performance measures of the primary task of interest or of a secondary task. The last two include psychophysio- logical measures and subjective scales. A given situation may require using one or more of these techniques, and may preclude the use of other techniques. Table 9.1 outlines several criteria that can be applied to determine which workload-assessment technique is most appropriate for a given situation.
276A technique should be sensitive to changes in the workload imposed by the primary task, particularly once overload levels are reached. It should also be diagnostic to the extent that the assessment can isolate particular processing resources being overloaded. On the basis of multiple- resources theory, this requires discriminating between capacities from the three dimensions of processing stages (perceptual–cognitive versus motor), codes (spatial–manual versus verbal– vocal), and modalities (auditory versus visual). If the technique applied to a particular situation is unable to detect any change in mental workload or determine how mental capacity is being overloaded, it is, obviously, not very useful.
277 Per formance
278
279 Technique
280Category
281Subcategory
282 Analytical
283Comparison Expert opinion
284Math models
285Task analysis methods
286Simulation models
287Primary task
288Subjective methods
289Secondary task
290Physiological
291Manual control models
292Information theory models
293Queuing theory models
294System response
295Operator response Rating scales
296Questionnaire/interview Subsidiary task
297Probe task
298Dual task Classical
299Specialized
300 Empirical
301 FIGURE 9.12 A taxonomy of mental workload techniques.
302Imagine now a system that requires the operator to make a considerable number of (large or small) movements, like during the control of a robotic arm. You might decide to assess mental workload during the performance of different tasks with a technique that requires the operator to wear several physio- logical measuring devices, one in the form of a somewhat uncomfortable helmet, another strapped around her upper arm, and a third attached to the end of her finger. Each device has a few wires around which the operator must coordinate her movements and some of them restrict her mobility.
303TABLE 9.1
304Criteria for Selection of Workload-Assessment Techniques
305 Criterion
306Sensitivity
307Diagnosticity
308Intrusiveness
309Implementation requirements
310Operator acceptance
311Explanation
312Capability of a technique to discriminate significant variations in the workload imposed by a task or group of tasks.
313Capability of a technique to discriminate the amount of workload imposed on different operator capacities or resources (e.g., perceptual versus central processing versus motor resources).
314The tendency for a technique to cause degradations in ongoing primary-task performance.
315Factors related to the ease of implementing a particular technique.
316Examples include instrumentation requirements and any operator training that might be required.
317Degree of willingness on the part of operators to follow instructions and actually use a particular technique.
318
319 These methods for assessing workload violate the remaining three criteria outlined in Table 9.1. We call such assessment techniques intrusive. Intrusive techniques will interfere with the operator’s ability to perform her primary task, and any workload estimates obtained this way will be difficult to interpret. Any observed decrements in performance may be due to the measurement technique and not the task. The implementation of this kind of technique is also a problem, because such sophisticated measuring devices may be difficult to obtain or maintain. Techniques should be implemented that involve the fewest problems in doing so. Finally, imagine how dissatisfied the operator is going to be when the equipment she is wearing interferes with her ability to perform duties, especially if she does not understand what purpose the equipment serves or why she is being monitored. If the measurement technique is not accepted by the operators who are being evaluated it will be very difficult to obtain meaningful workload measures. Not only will the operators be reluctant to perform at their best, but they may also actively sabotage the study (and the equipment).
320It should be apparent from these considerations that selection of a workload measure appropriate for a particular problem is a crucial part of workload evaluation.
321Primary-task measures. Primary-task measures evaluate the mental workload requirements of a task by directly examining performance of the operator or of the overall system. The assump- tion is that as task difficulty increases, additional processing resources will be required. Perfor- mance of the primary task deteriorates when the workload requirements exceed the capacity of the available resources. Some commonly used primary-task measures are glance duration and frequency (higher workload is associated with longer and=or more frequent glances) and number of control movements=unit time (e.g., number of brake actuations per minute).
322It is important to use more than one primary-task measure of workload. Because different components of a task will require mental resources in different ways, a single performance measure may show no effect of workload where other measures would. For example, in a study evaluating the impact of traffic-situation displays on pilot workload, Kreifeldt et al. (1976) obtained 16 measures of flight performance, including final airspeed error and final heading error, in a flight simulator. Some measures showed that the traffic-situation displays lowered workload demands but other measures showed nothing. For example, the airspeed measure showed no improvement in flight performance with the display, while the heading error measure did. If only air speed were measured, then the display designer might conclude that the traffic-situation display did not reduce pilot workload. Using as many measures as we can, we can get a more accurate picture of workload.
323Primary-task measures are good for discriminating overload from nonoverload condi- tions because performance will suffer when the operator is overloaded. However, they are not good for measuring differences in mental workload in conditions when performance shows no impairment. An alternative way of examining primary-task performance that sidesteps this problem is to examine the changes in strategies that operators employ as task demands are varied (Eggemeier, 1988). For example, our gauge checker may, under low levels of workload, respond to system abnormalities by rote and without consulting an operating manual. However, under high levels of workload, he or she may rely on printed instructions such as those in an operating manual to recover the system. Any obvious strategy changes may be indicators of increased workload.
324Primary-task measures are also not diagnostic of those mental resources that are being overloaded. Furthermore, although they are usually nonintrusive, primary-task measures may require sophisticated instrumentation that renders them difficult to implement in many operational settings.
325Secondary-task measures. Secondary-task measures are based on the logic of dual- task performance we described earlier in the chapter. The operator is required to perform a secondary task in addition to the primary task of interest. Workload is assessed by the degree to which performance on either the primary or secondary task, whichever is not to receive priority, deteriorates in the dual-task situation relative to when each task is performed alone. Thus, dual-task
326
327 interference provides an index of the demands placed on the operator’s attentional resources by the two tasks.
328Secondary-task measures are more sensitive than primary-task measures. In nonoverload situations, where the primary task can be performed efficiently, the secondary-task measures can assess differences in spare capacity. Secondary-task measures are also diagnostic in that specific sources of workload can be determined through the use of secondary tasks of different modalities. Possible drawbacks of secondary-task measures are that they can be intrusive and may introduce artificiality by altering the task environment. Also, the operator may need to practice the dual task considerably before his performance level stabilizes.
329Operator workload can be assessed either by manipulating primary-task difficulty and observ- ing variations in secondary-task performance, or by manipulating secondary-task difficulty and observing variations in primary-task performance. In the loading task paradigm, we tell operators to maintain performance on the secondary task even if primary-task performance suffers (Ogden et al., 1979). In this paradigm, performance deteriorates more rapidly on difficult than on easy primary tasks. For example, Dougherty et al. (1964) examined the workload requirements of two displays for helicopter pilots: a (then) standard helicopter display and a pictorial display. The primary task was flying at a prescribed altitude, heading, course, and air speed. The secondary, or loading, task was reading displayed digits. Primary-task performance did not differ for the two display conditions when flying was performed alone or when the digits for the secondary task were presented at a slow rate. However, at fast rates of digit presentation, the pictorial display produced better flying performance than the standard display. Thus, the mental workload requirements were apparently less with the pictorial display.
330In the subsidiary task paradigm, we tell operators to maintain performance on the primary task at the expense of the secondary task. Differences in the difficulty of the primary task will then show up as decrements in performance of the secondary task. This paradigm is illustrated in a study by Bell (1978) that examined the effects of noise and heat stress. For the primary task, people had to keep a stylus on a moving target. The secondary task involved an auditory stream of numbers. The people pressed a telegraph key once if a number was less than the previous number and twice if the number was greater than the previous number. Secondary-task performance was degraded by both high noise levels and high temperature, although primary-task performance was unaffected.
331The human factors specialist must decide which of several types of secondary tasks to use for measuring workload. The task should be one that draws on the processing resources required by the primary task. If it does not, the workload measure will be insensitive to the workload associated with that task. Moreover, several distinct secondary tasks can be selected to provide a profile of the various resource requirements of the primary task. Some commonly used secon- dary tasks are simple reaction time, which involves perceptual and response-execution resources; choice reaction time, which also imposes central-processing and response-selection demands; monitoring for the occurrence of a stimulus, which emphasizes perceptual processes; and mental arithmetic, which requires central-processing resources.
332Verwey (2000) used two different secondary tasks to diagnose the workload imposed on drivers in different road situations (e.g., standing still at a traffic light, driving straight ahead, driving around a curve, etc.). While driving a designated route, drivers also performed one of two secondary tasks: saying ‘‘yes’’ when they detected a visual stimulus (a two-digit number) on a dashboard display or adding 12 to an auditorily presented number and speaking the answer. The visual-detection task measures visual workload whereas the addition task measures mental workload. Visual detection performance varied greatly across different road situations, suggesting that different roads produced low, intermediate, and high levels of workload. Auditory addition performance also varied across road situations, but to a lesser extent. So although a large part of the effect of different road situations is on visual workload there is some influence on mental workload.
333
334 One problem with the secondary-assessment procedure as we have described it so far is its artificiality. No one is forced to add 12 to random numbers while they are driving. To minimize interfering effects of artificiality, an embedded secondary task can be used (Shingledecker, 1980). This is a task that is part of the normal operator duties but is of lower priority than the primary task. As one example, workload can be measured for pilots using radio communication activities as an embedded secondary task. Intrusiveness is minimized in this way, as is the need for special instrumentation, but the information about workload demands that can be obtained may be restricted.
335Psychophysiological measures. There are many popular psychophysiological indices of cogni- tion, including measurement of EEGs, ERPs, and functional neuroimaging. Some of these psycho- physiological indices can be used to measure workload (Baldwin, 2003). Such measures avoid the intrusiveness of a secondary task, but they introduce a new problem of requiring sophisticated instrumentation. Moreover, the possibility exists that the equipment and procedures necessary to perform the measurements may be intrusive in a different way and interfere with primary-task performance, as we discussed with the operator of the robotic arm. The major benefit of psycho- physiological measures is that they have the potential to provide online measurement of the dynamic changes in workload as an operator is engaged in a task.
336Many kinds of psychophysiological measures have been used to measure workload, but they all generally fall into two classes: those that measure general arousal and those that measure brain activity. General arousal level is presumed to increase as mental workload increases, and indices of arousal thus provide single measures of workload. One such technique is pupillometry, or the measurement of pupil diameter. Pupil diameter provides an indicator of the amount of attentional resources that are expended to perform a task (Beatty, 1982; Kahneman, 1973). The greater the workload demands, the larger the pupil size. For example, when air traffic controllers use a static storm forecast tool, they have larger pupil diameters than when they use a dynamic forecast tool. This suggests that the dynamic tool reduces workload (Ahlstrom & Friedman-Berg, 2006). The changes in pupil diameter are small but reliable, and require a pupillometer to allow sufficiently sensitive measurements. While useful as a general measure of workload, pupil diameter cannot distinguish between the different resources that are being overloaded in the performance of a task.
337A second psychophysiological measure of mental workload is heart rate. Increased heart rates are correlated with increased workloads (Wilson & O’Donnell, 1988). However, because heart rate is determined primarily by physical workload and arousal level, changes in heart rate do not always indicate changes in mental workload. A better measure is heart rate variability, the extent to which heart rate changes over time (Meshkati, 1988). A component of this variability has been isolated, fluctuations in heart rate that occur with a period of about 10 s, that decreases as mental effort increases (Boucsein & Backs, 2000). Heart rate variability is only one of several measures of activation of the autonomic nervous system, and sophisticated models are being developed that allow interpretation of a variety of cardiovascular effects of mental workload (Van Roon et al., 2004).
338The second category of measures estimate the brain activity associated with specific processes. The most reliable of these measures involves ERPs, introduced in Chapter 4. Presentation of a stimulus causes a short-lived or transient electrical response from the brain, which arises as a series of voltage oscillations that originate in the cortex. These transient responses can be measured by electrodes attached to the scalp; many trials must be averaged to determine the wave form of the ERP for a particular situation. Components of the evoked response are either positive (P) or negative (N). They can also be identified in terms of their minimal latencies from the onset of the stimulus event (see Figure 9.13). The P300 (a positive component that occurs approximately 300 ms after the event onset) shows amplitude and latency effects that can be interpreted as reflecting workload. The latency of the P300 peak is regarded as an index of stimulus-evaluation difficulty (Dien et al., 2004; Donchin, 1981), although it may be influenced by response-selection difficulty as well (Leuthold & Sommer, 1998). The amplitude of the P300 decreases as a stimulus is repeated but then increases again when an unexpected stimulus occurs (Duncan-Johnson & Donchin, 1977). Thus, the P300 seems to reflect the amount of cognitive processing performed on a stimulus.
339
340 20 μV
3410 300
3428 Display elements 4 Display elements Count only
343600 900 1200 Time (ms)
344FIGURE 9.13 Amplitude of transient evoked response (P300) for tasks of different workloads.
345The P300 is sensitive to the workload demands of real-world tasks. Kramer et al. (1987) had student pilots fly a series of missions on a flight simulator. The flight was the primary task in a dual- task paradigm. The difficulty of the primary task was manipulated by varying wind conditions, turbulence, and the probability of a system failure. For the secondary task, the pilot pressed a button whenever one of the two tones occurred. The P300 latency to the tones increased and the amplitude decreased with increasing difficulty of the mission, indicating that the tones were receiving less processing as the workload of the primary task increased.
346Because the P300 is sensitive to stimulus-evaluation processes, it can be used to assess workload associated with the detection of rare or novel stimulus events (Spencer et al., 1999). Other components of ERPs are more closely linked to early sensory and response-initiation processes and can be used to evaluate demands on these resources. For example, Handy et al. (2001) showed that high perceptual load can reduce the extent to which other visual stimuli are processed. High perceptual load not only reduced an observer’s ability to detect a peripheral visual stimulus, but also reduced the magnitude of the P100 ERP response to the stimulus, suggesting reduced processing of the stimulus in the primary visual cortex.
347In sum, the P300 and other ERP measures are useful when we must assess workload in a way that does not disrupt performance of the primary task. However, recording ERPs requires sophis- ticated instrumentation and control procedures that may make these measures difficult to obtain.
348Subjective measures. Subjective assessment techniques evaluate workload by obtaining the operators’ judgments about their tasks. Typically we ask operators to rate overall mental workload or several components of workload. The strength of these techniques is that they are relatively easy to implement and tend to be easily accepted by operators. Given these virtues, it is not too surprising that subjective workload measures tend to be used extensively in the field. Indeed, Brookhuis and De Waard (2002, p. 1026) note, ‘‘In some areas such as traffic and transportation research, subjective measures and scales are rather common. It is hard to imagine research in the field without subjective measurement.’’
349Despite their usefulness, there are some limitations to subjective assessment techniques (Boff & Lincoln, 1988): (1) they may not be sensitive to aspects of the task environment that affect primary-task performance and, hence, it may be best to couple their use with primary-task measures; (2) operators may confuse perceived difficulty with perceived expenditure of effort; and (3) many factors that determine workload are inaccessible to conscious evaluation.
350There are many subjective mental workload instruments, or standardized scales, in wide use. We will describe four of the most popular. The first, the modified Cooper–Harper scale is
351Amplitude of transient evoked response
352
353 appropriate when only an overall measure of workload is desired. The other three, the Subjective Workload Assessment Technique (SWAT), NASA Task Load Index (NASA-TLX), and Workload Profile (WP), provide estimates of several distinct aspects of workload.
354Cooper and Harper (1969) developed a rating scale to measure the mental workload involved in piloting aircraft with various handling characteristics. The scale has since been modified by Wierwille and Casali (1983) to be applicable to a variety of settings. Figure 9.14 shows how the scale involves traversal of a decision tree, yielding a rating between 1 (low workload) and 10 (high workload). The modified Cooper–Harper scale yields a single measure that is sensitive to differences in workload and consistent across tasks (Skipper et al., 1986).
355The SWAT was designed initially for use with a variety of tasks and systems (Reid et al., 1981). The procedure requires operators to judge which tasks have higher workload than others using a card-sorting procedure. Each card depicts a task that differs in three subcategories of workload (time load, mental effort load, and stress load), with three classifications for each (see Table 9.2). Time load refers to the extent to which a task must be performed within a limited amount of time and the extent to which multiple tasks must be performed at the same time. Mental effort load involves inherent attentional demands of tasks, such as attending to multiple sources of information and performing calculations. Stress load encompasses operator variables such as fatigue, level of training, and emotional state, that contribute to an operator’s anxiety level.
356Operators are asked to order all 27 possible combinations of the three descriptions according to their amount of workload. We then apply a process called conjoint scaling to the data to derive a scale of mental workload. Once the scale has been derived, we can estimate workload for various situations from simple ratings of the individual dimensions. The SWAT procedure is sensitive to workload increases induced by increases in task difficulty, as well as to those caused by sleep deprivation or increased time-on-task (Hankey & Dingus, 1990). However, the procedure is not very sensitive to low mental workloads and the card-sorting pretask procedure is time consuming. Consequently, Luximon and Goonetilleke (2001) developed a pairwise-comparison version where operators choose, between two task descriptions, the one that has the higher workload. This version takes less time and yields a scale of high sensitivity.
357Perhaps the most widely used subjective technique is the NASA-TLX (Hart & Staveland, 1988). This index consists of six scales on which operators rate workload demands (see Table 9.3). The scales evaluate mental demand, physical demand, temporal demand, performance, effort, and frustration level. These scales were selected from a larger set on the basis of research that showed each to make a relatively unique contribution to the subjective impression of work- load. An overall measure of workload can be obtained by assigning a weight to each scale according to its importance for the specific task, then calculating the mean of the weighted values of each scale.
358One example of how the NASA-TLX can be used comes from studies of vigilance. Whereas previously it was thought that vigilance was relatively undemanding, observers rate the mental workload as high on the NASA-TLX, with mental demand and frustration being primary factors (Becker et al., 1991). As noted, such results suggest that the vigilance decrement does not simply reflect a decrease in arousal and that vigilance performance requires considerable effort. Another example involves the use of NASA-TLX and SWAT measures by Airbus Industries (De Keyser & Javaux, 2000). Their goal was to demonstrate that their large aircraft could be flown safely by small, two-person crews. The NASA-TLX and SWAT measures showed that the workload experienced by the two crew members was acceptable.
359You may have noticed that although the NASA-TLX and SWAT instruments measure different aspects of mental workload, they do not map very closely onto the multiple-resources model of attention. Consequently, Tsang and Velasquez (1996) developed a subjective technique called the WP. The workload dimensions used for this technique are defined by the dimensions hypothesized in Wickens’s (1984) multiple-resource model: processing stage (perceptual=central or response
360
361 Yes
362Fair,
363mild difficulty
364Acceptable operator mental effort is required to attain adequate system performance
3653 4
366Is mental workload No level acceptable?
367Mental workload is high and should be reduced
368Minor but annoying difficulty
369Moderately high operator mental effort is required to attain adequate system performance
370Yes
371Very objectionable
372but tolerable difficulty
373Maximum operator mental effort is required to attain adequate system performance
374Are errors small No and inconsequential?
375Major deficiencies,
376Major difficulty Major difficulty
377Maximum operator mental effort is required
378to bring errors to moderate level 7
379Yes
380Major difficulty
381Intense operator mental effort is required to accomplish task, but frequent or numerous errors persist
382Even though errors
383may be large or frequent, can No
384Major deficiencies, system redesign is mandatory
385Impossible
386Instructed task cannot be accomplished
387reliably 10
388instructed task be accomplished most of the time?
389Operator decisions
390FIGURE 9.14 The modified Cooper–Harper scale.
391system redesign is strongly recommended
392Very easy
393highly desirable
394Operator mental effort is minimal and desired performance is easily attainable
3951 2
396Easy, desirable
397Operator mental effort is low and desired performance is attainable
398Moderately objectionable difficulty
399High operator mental effort is required to attain adequate system performance
4005 6
401Maximum operator mental effort is required to avoid large or numerous errors
4028 9
403
404 TABLE 9.2
405Three-Point Rating Scales for the Time, Mental Effort, and Stress Load Dimensions of the Subjective Workload-Assessment Technique
406Time Load
4071. Often have spare time. Interruptions or overlap among activities occurs infrequently or not at all.
4082. Occasionally have spare time. Interruptions or overlap among activities occurs frequently.
4093. Almost never have spare time. Interruptions or overlap among activities occurs frequently, or occur all the time.
410Mental Effort Load
4111. Very little conscious mental effort of concentration required. Activity is almost automatic, requiring little or no attention.
4122. Moderate conscious mental effort or concentration required. Complexity of activity is moderately high due to uncertainty, unpredictability, or unfamiliarity. Considerable attention required.
4133. Extensive mental effort or concentration is necessary. Very complex activity requiring total attention.
414Stress Load
4151. Little confusion, risk, frustration, or anxiety exists and can be easily accommodated.
4162. Moderate stress due to confusion, frustration, or anxiety noticeably adds to workload. Significant compensation is required to maintain adequate performance.
4173. High to very intense stress due to confusion, frustration, or anxiety. High to extreme determination and self-control required.
418 selection=execution, processing code (spatial or verbal), input modality (visual or auditory), and output modality (manual output or speech). For all tasks, the person rates each value for each dimension, assigning a number from 0 to 1 to indicate no attention demand (0) up to maximal attentional demand (1). So, for example, visual modality would be given a rating of 0 if there were no visual displays and 1 (or close to it) if there was heavy visual demand. Rubio et al. (2004) evaluated WP, NASA-TLX, and SWAT on several dimensions for participants who performed single and dual tasks. Their analyses showed that WP had higher sensitivity and better diagnosticity than the two more widely used measures.
419 TABLE 9.3
420NASA-TLX Rating Scale Definitions
421Title
422Mental demand
423Physical demand
424Temporal demand Performance
425Effort Frustration level
426End Points
427Low=high Low=high
428Low=high Low=high
429Low=high Low=high
430Description
431How much mental and perceptual activity was required (e.g., thinking, deciding, calculating, remembering, looking, searching, etc.)? Was the task easy or demanding, simple or complex, exacting or forgiving?
432How much physical activity was required (e.g., pushing, pulling, turning, controlling, activating, etc.)? Was the task easy or demanding, slow or brisk, slack or strenuous, restful or laborious?
433How much time pressure did you feel due to the rate or pace at which the tasks or task elements occurred? Was the pace slow and leisurely or rapid and frantic?
434How successful do you think you were in accomplishing the goals of the task set by the experimenter (or yourself)? How satisfied were you with your performance in accomplishing these goals?
435How hard did you have to work (mentally and physically) to accomplish your level of performance?
436How insecure, discouraged, irritated, stressed and annoyed versus secure, gratified, content, relaxed, and complacent did you feel during the task?
437
438 Finally, there are at least two issues that limit the use of subjective measurements and their interpretation. For one, the workload ratings obtained with them are sensitive only to the range of conditions to which the observers are exposed. Colle and Reid (1998) found that operators who experienced only a few levels of task difficulty rated their workloads as much higher than did operators who experienced a much broader range of difficulty levels. The authors recommend that experience with all possible difficulty levels for tasks be provided before ratings for particular task conditions are obtained. A second issue is that subjective estimates of mental workload can be different from psychophysiological or performance measures, so different that different conclusions might be reached about those situations that produce high versus low levels of workload.
439ANALYTICAL TECHNIQUES
440In contrast to empirical techniques, analytic techniques do not require the interaction of an operator with an operational system or simulator. Hence, they are used to estimate workload at early stages of system development. There are many analytic measurement techniques that rely on different estimators of workload. Consequently, it is best to use a battery of techniques to assess the workload demands of any specific system. In the following sections, we discuss five categories of analytic techniques (Lysaght et al., 1989): comparison, expert opinion, mathematical models, task analysis methods, and simulation models.
441Comparison. The comparison technique uses workload data from a predecessor system to estimate the workload for a system under development. One systematic use of the comparison technique was reported by Shaffer et al. (1986). They estimated the mission workload for a single- crewmember helicopter, based on data from an empirical workload analysis of missions conducted with a two-crewmember helicopter. This technique is useful only if workload data from a pre- decessor system exist, which is often not the case.
442Expert opinion. One of the easiest and most extensively employed analytic techniques is expert opinion. Users and designers of systems similar to the one being developed are provided a description of the proposed system and asked to predict workload, among other things. The opinions can be obtained informally or formally (and more formal methods are better). For example, SWAT has been modified for prospective evaluations from experts. The major modification is that the ratings are based on a description of the system and particular scenarios, rather than on actual operation of the system. In the evaluation of pilot workload for military aircraft, prospective ratings using SWAT (and other methods) correlate highly with workload estimates made on the basis of performance (Eggleston & Quinn, 1984; Vidulich et al., 1991).
443Mathematical models. There have been many attempts to develop mathematical models of mental workload. Models based on information theory were popular in the 1960s. One model by Senders (1964) assumed that an operator with limited attentional capacity samples information from a number of displays. The channel capacity for each display and processing rate of the operator determined how often a display must be examined for the information in it to be communicated accurately. The amount of time an operator devotes to any particular display could thus be used as a measure of visual workload.
444In the 1970s, models based on manual control theory and queuing theory became popular. Manual control models apply to situations where continuous tasks, such as the tracking of a target, must be performed. These rely on minimization of error via various analytical and theoretical methods. Queuing models view the operator as a server that processes a variety of tasks. The number of times the server is called upon provides a measure of workload. Although development of these mathematical models has continued, their use for workload estimation has diminished in recent years as computerized task analyses and simulations have been developed.
445Task analysis. As noted in earlier chapters, task analysis decomposes the overall system goal into segments and operator tasks, and ultimately into elemental task requirements. The analysis provides a time-based breakdown of demands on the operator. Consequently, most task-analytic
446
447 measures of mental workload focus on estimation of time stress, which is the amount of mental resources required per unit time relative to those that are available. One exception is the McCracken–Aldrich technique (Aldrich & Szabo, 1986; McCracken & Aldrich, 1984), which distinguishes five workload dimensions: visual, auditory, kinesthetic, cognitive, and psychomotor. For each task element, ratings are made on a scale from 1 (low workload) to 7 (high workload) for each task dimension. Estimates of the workload on each dimension are made during half-second intervals by summing the workload estimates for all active task components. If the sum exceeds 7, an overload exists for that component.
448Simulation models. A simulation model is probabilistic and, hence, will not yield the same result each time it is run. There are several simulation models that can be used to provide workload estimates. Most are variants of the Siegel and Wolf (1969) stochastic model discussed in Chapter 3. In that model, workload is indicated by a variable called ‘‘stress’’ that is affected by both the time to perform tasks and the quantity of tasks that must be performed. Stress is the sum of the average task execution times divided by the total time available. Several recent extensions of this technique have been developed that allow for greater flexibility in the prediction of workload (Lysaght et al., 1989).
449SUMMARY
450Attention research exemplifies the ideal of a close relationship between basic and applied concerns in human factors. The resurgence of interest in attention arose from applied problems, but it has led to much basic, theoretical work on the nature of attentional control. This basic work, in turn, has led to better measures of attentional requirements in applied settings.
451Often operators must perform tasks that require selectively attending to specific sources of information, distributing attention across multiple sources of information, or maintaining attention on a single display for long periods. We can apply what we know about how attention works to the design of systems for effective performance under different situations. For example, we know that presentation of information in different modalities avoids decrements in performance due to competition for perceptual resources and improves memory for unattended information. More generally, assessment of mental workload can help determine the tasks that can be performed simultaneously with little or no decrement. Because mental workload varies as a function of the perceptual, cognitive, and motoric requirements imposed on an operator, the structure of a task and the environment in which it is performed can significantly affect workload and performance.
452RECOMMENDED READINGS
453Gopher, D. & Donchin, E. 1986. Workload: An examination of the concept. In K.R. Boff, L. Kaufman, & Thomas, J.P. (Eds.), Handbook of Perception and Human Performance, Vol. II: Cognitive Processes and Performance (pp. 41–49). New York: Wiley.
454Hancock, P.A. & Meshkati, N. (Eds.) 1988. Human Mental Workload. Amsterdam: North-Holland. Johnson, A. & Proctor, R.W. 2004. Attention: Theory and Practice. Thousand Oaks, CA: Sage. Kahneman, D. 1973. Attention and Effort. Englewood Cliffs, NJ: Prentice-Hall.
455Parasuraman, R. (Ed.) 1998. The Attentive Brain. Cambridge, MA: MIT Press.
456Pashler, H.E. 1998. The Psychology of Attention. Cambridge, MA: MIT Press.
457Posner, M.I. (Ed.) 2004. Cognitive Neuroscience of Attention. New York: Guilford Press. Styles, E.A. 2006. The Psychology of Attention (2nd ed.). Hove, UK: Psychology Press.