Many daily activities require people to track several objects at the same time. For instance, drivers need to constantly keep track of the movements of other cars in order to avoid collisions. Also, when an athlete is playing basketball, he/she needs to keep an eye on the changing positions of his or her teammates and opponents in order to make effective decisions related to whether to attack or defend. Tracking ability is therefore a must-have survival skill for safer, better lives.

Pylyshyn and Storm (1988) pioneered the investigation into the dynamic tracking ability of our visual system under the lens of the multiple object tracking (MOT) paradigm. In a typical MOT task, participants are presented with a field of identical objects, some of which are cued as targets to be tracked. All objects then move independently and randomly for a period of time, after which participants are asked to identify all the targets (see Figure 1).

Figure 1 

A typical MOT trial. At the start of the trial, four identical objects flash in order to indicate that they are targets to be tracked. All objects then begin moving around the display. At end of the trial, the participant is required to click on the four targets.

Based on the MOT paradigm, Pylyshyn (1989, 2001, 2007) further proposed the visual index theory to explain people’s visual tracking mechanism. This theory argues that MOT is executed by an early or low-level visual system, and that this system can offer four to five indexes that may be assigned to visual objects. Here, an index is considered to be a reference token, or a point to connect a person’s visual system with the real physical world. As such, an index only conveys the location information of objects, but does not encode or represent any feature information of objects (i.e., in that sense it is feature-blind; Pylyshyn, 1989, 1994, 2007). Once designated to a target, the index will always stick to the target no matter how the target moves, and if the number of targets falls within the maximum capacity of the early visual system, then tracking will be successful. According to Pylyshyn’s theory, index maintenance is automatically carried out by the early visual system without any attentional effort, and the whole tracking process is achieved without any involvement of high-level cognition, although it may require a specific memory subsystem for its own operation.

In Pylyshyn and Storm’s (1988) original MOT task, participants are asked to discriminate between tracked targets and untracked distractors and tested for their ability to retrieve target locations. Since the moving objects in MOT studies are typically identical, tracking is achieved primarily by updating objects’ spatiotemporal information and overlooks any contribution from feature or identity information. However, in reality, tracked objects usually have distinct identities and people do not only need to track the positions of multiple objects (i.e., “where” information), but also the content of a particular target (i.e., “what” information). As a target is continuously moving, observers need to continuously bind the target’s identity with its new location. On the basis of preceding MOT studies, researchers developed a multiple identity tracking (MIT, Oksama & Hyönä, 2004, 2008) paradigm in order to afford identity information to objects and to explore the cognitive processing mechanisms of dynamic tracking in the real world. MIT is commonly represented by a tracking task in which each target has a distinct identity. According to the way in which responses are made, there are two forms of MIT task: one requires observers to report the targets’ locations, but to ignore their identities, and thus investigates whether people are able to use feature information to aid tracking; whereas the other requires observers to report both the locations and identities of targets, and thus investigates whether people are able to track the identities of distinct objects.

A host of MIT research has shown that identity information can be processed and influence tracking capability, suggesting the presence of a content-addressable representation during tracking (knowing which target is where; Horowitz, Klieger, Fencsik, Yang, Alvarez, & Wolfe, 2007; Howe & Holcombe, 2012; Makovski & Jiang, 2009a; Makovski & Jiang, 2009b). The content-addressable representation, however, has a lower capacity than the location-addressable representation (knowing where the targets are; Botterill, Allen, & McGeorge, 2011; Cohen, Pinto, Howe, & Horowitz, 2011). Furthermore, the capacity of observers to successfully identify and track targets depends on feature types (Liu, Chen, Liu, & Fu, 2012; Liu, Chen, Xuan, & Fu, 2009).

In early research, simple physical features such as color, shape, size and line direction, were often used as identity features. However, objects in the real world not only contain physical properties, but also hold social properties. Apparently, processing the social information of an object is more complex than processing physical features alone. As complex visual stimuli, faces contain a large amount of social information, including gender, identity, emotion, personality and ethnicity. In recent years, researchers have started to explore how faces are tracked (Allen & Gabbert, 2013; Liu & Chen, 2012; Oksama & Hyönä, 2008; Ren, Chen, Liu, & Fu, 2009). Oksama and Hyönä (2008) firstly employed faces as tracking stimuli in an MIT experiment that investigated the effects of facial familiarity on identity tracking. They found that pseudo-faces, created from famous faces (e.g., Albert Einstein and Bill Clinton) by deconstructing them and then rearranging and recombining the parts, were harder to track than famous faces. In another study, Ren and colleagues (2009) focused on whether identity processing of unfamiliar faces is mandatory without deliberate intentions. They showed that target facial identities are processed during tracking even when such encoding is irrelevant to the task. This mandatory identity processing was shown to interfere with tracking capability, as it competed with this process for limited attentional resources (Ren, Chen, Liu, & Fu, 2009). Researchers further found that attentional tracking can be biased by contextual information about the target face’s social roles (Allen & Gabbert, 2013) and the attractiveness of target faces (Liu & Chen, 2012).

Facial expressions also play an important role in facial studies, as these reflect an individual’s mental state and convey abundant social information related to the situated environment. Effective recognition of facial expressions helps individuals succeed in social interactions (Van Kleef, 2009). Taylor and Therrien (2005) further argue that facial expressions influence the attentional bias of observers, negative facial expressions (especially fear) can quickly capture our attention (Carlson & Reinke, 2010; Carlson, Reinke, & Habib, 2009; Eastwood, Smilek, & Merikle, 2001, 2003; Eimer & Kiss, 2007; Fox, 2002; Globisch, Hamm, Esteves, & Öhman, 1999; Pourtois, Grandjean, Sander, & Vuilleumier, 2004).

The need to track multiple moving faces, each of which convey facial expressions, is quite a frequent occurrence in everyday life. For instance, a kindergarten teacher needs to be able to simultaneously care for numerous children, each demonstrating different emotions through expression. It is surprising then that no research has ever explored the effects of facial expressions on attentional tracking in which sustained attention is distributed among multiple moving faces. Previous studies related to the influence of facial expressions on attention have only adopted a static paradigm, and, while they have demonstrated that responses to faces with negative facial expressions are rapid and transient, it remains unknown whether such an attentional bias only exists for a brief period. Therefore, the present study aimed to explore whether there is a similar effect of facial expressions on attentional tracking. In particular, recognition of fearful emotions conveys useful information for individuals’ survival, as efficient detection helps individuals avoid potential dangers in the environment (Masterson & Crawford, 1982; Mineka & Öhman, 2002). Considering the ecological significance of fearful emotions, the present research employed fearful faces as tracking stimuli.

In addition to the above, researchers have not yet reached a consensus on whether facial expression processing acts independently from attentional modulation. In other words, is facial expression processing automatic or controlled? Automatic processing would imply that facial expression processing does not require attentional resources and is not influenced by cognitive control (Anderson, Christoff, Panitz, De Rosa, & Gabrieli, 2003; Esteves, Dimberg, & Öhman, 1994; Öhman, 2002; Vuilleumier, Armony, Driver, & Dolan, 2001), while in contrast, the controlled view suggests that the processing of facial expressions is influenced by the extent of available attentional resources (Eimer, Holmes, & McGlone, 2003; Holmes, Vuilleumier, & Eimer, 2003; Pessoa, McKenna, Gutierrez, & Ungerleider, 2002; Pessoa, Padmala, & Morland, 2005). Another aim of the present study was therefore to examine whether attentional resources mediate any effect of fearful expressions on multiple face tracking.

We manipulated the attentional load of tracking by changing the relative proximity of objects to each other. Previous research has found that proximity is the root cause of all performance constraints in visual tracking (Bae & Flombaum, 2012; Franconeri, Jonathan, & Scimeca, 2010; Franconeri, Lin, Enns, Pylyshyn, & Fisher, 2008; Iordanescu, Grabowecky, & Suzuki, 2009; Shim, Alvarez, & Jiang, 2008), as decreased distance between objects requires more precise object representation, and the limited spatial resolution of attention does not meet this requirement. Hence, more attentional resources are required to distinguish between targets and distractors. Increases in object speed, trial duration, target load and number of distractors, all increase the frequency with which targets and distractors are in close proximity, and thus indirectly impair tracking performance. In this study, we adopted the planets and moons tracking (PMT) paradigm, proposed by Tombu and Seiffert (2011), in which object speed and proximity can be independently manipulated. The motion pattern during PMT is similar to the motion of planets and moons as each target-distractor group rotates around their group center in addition to the screen center.

The present research hypothesized that if the processing of fearful facial expressions is automatic, then target faces with fearful expressions would improve tracking performance, while distractor faces with fearful expressions would harm tracking performance. In contrast, if the processing of fearful facial expressions is modulated by attentional load, then this effect would be expected to be present only under the low-attentional load condition.


Ethics Statement

The study was approved by the Research Ethics Board of Zhejiang University and all participants provided written informed consent before taking part in the experiment.


A total of 19 undergraduate students (13 males) with normal or corrected-to-normal vision took part in this study in exchange for course credits or monetary payment. The age of the participants ranged from 18 to 27 years (M = 22.01, SD = 2.89).

Stimuli and Apparatus

Facial images were selected from photographs of 39 different individuals (23 men and 16 women) taken from the MacBrain Face Stimulus Set ( Fearful and neutral expressions were selected for each face, resulting in a total of 78 facial pictures. Each picture was processed to exclude all features except the eyes, eyebrows, nose and mouth. All images were scaled to the same size: 2.5° in length and 2.2° in width, and were further manipulated to the same mean luminance and root-mean-square contrast.

Participants were tested individually in a room with normal interior lighting and sat approximately 57 cm away from a 19-inch CRT monitor with a pixel resolution of 1,600 × 1,200 and a refresh rate of 85 Hz. The background color of the display was black. The experimental procedure was generated in psychopy (Peirce, 2007, 2008).


On all trials, participants were asked to track four target faces among four distractor faces. The eight faces presented in each trial were of the same sex. They were randomly chosen for each trial from the facial database. Each trial comprised of a target cue phase, a tracking phase and a response phase. The procedure in each trial is illustrated in Figure 2.

Figure 2 

Illustration of the trial procedure used in our study. Figure 2A shows an example of tracking fearful target faces among fearful distractor faces in the low-attentional load condition, while Figure 2B shows an example of tracking fearful target faces among neutral distractor faces in the high-attentional load condition. During the target cue phase, four groups of faces were arranged evenly along a large imaginary circle. Each group consisted of one target face and one distractor face, which were in turn distributed evenly on a small imaginary circle. The target faces to be tracked were flashed three times in order to highlight them to the participant. The target cues then disappeared and all faces began rotating. Two face groups rotated clockwise around their group center, and the other two rotated counter-clockwise. Each group simultaneously rotated around the screen center. During the response phase, all faces were occluded by grey rectangles. Participants were then shown each target face in turn and asked to click on its location.

During the target cue phase, four groups of faces were located equidistantly along a large imaginary circle (radius = 7.5°) centered on a white fixation cross (1.0° × 1.0°) at the screen center. Each group consisted of two faces, one of which flashed on and off three times over a two-second period in order to identify it as a target. The other faces remained constantly visible during this time. Each group therefore consisted of a target face and a distractor face that were distributed equidistantly on a small imaginary circle whose perimeter passed through the center of each face. Two conditions of distance were created based on the different radii of the small imaginary circle, and thus two attentional loads were generated. The distance between target and distractor faces varied from 3.6° (high-attentional load) to 4.4° (low-attentional load) of visual angle. The selection of these two distances was based on the critical distance of 4° of visual angle that has been used in previous studies (Bae & Flombaum, 2012; Franconeri, Alvarez, & Enns, 2007). The distance was fixed within trials, but varied across trials.

At the start of the tracking phase, target cues disappeared and all faces moved around the screen for six seconds. In half of the trials, face groups rotated clockwise around the fixation cross and rotated counter-clockwise in the other half. Both rotated at a speed of 60°/s. On each trial, two face groups rotated clockwise around their group center, while the other two rotated counter-clockwise. The angular speed at which the faces rotated around the center of their group was 90°/s and 73°/s for the near and far distance conditions respectively. This manipulation ensured that all faces traveled an equal linear distance across all the experimental distance conditions.

At the end of the tracking period, all eight faces stopped moving and were occluded by grey rectangles, whose size was equal to the faces (response phase). Thereafter, a series of four questions were presented in the upper section of the screen, one at a time, in order to test for each target face. Questions were of the form “Where is ____” with the blank to be filled by the image of a specific target face. Participants were instructed to click on each specific target face in turn. No feedback was given and the spacebar was used to initiate the next trial.

Participants were instructed to keep their eyes focused on the fixation cross throughout the whole experiment, but eye movements were not monitored as they have been shown not to affect tracking performance. Pylyshyn and Storm (1988) monitored fixation and eliminated trials on which participants made eye movements. Under these conditions, they obtained qualitatively similar results to other studies that either required participants to maintain fixation but did not monitor eye movements (e.g., Allen, Mcgeorge, Pearson, & Milne, 2006; Scholl & Pylyshyn, 1999; Sears & Pylyshyn, 2000), or else employed no special instructions concerning fixation (e.g., Intriligator & Cavanagh, 2001; Scholl, Pylyshyn, & Feldman, 2001; Yantis, 1992).


We employed a 2 (expression of target faces: fearful vs. neutral) × 2 (expression of distractor faces: fearful vs. neutral) × 2 (attentional load: low vs. high) within-subjects design. Target faces, which were either all fearful or all neutral, were tracked among distractor faces, which were also either all fearful or all neutral. Attentional load was manipulated by varying the distance between target and distractor faces. In the low-attentional load condition, target and distractor faces were relatively far away from each other and so little attentional resources were needed to distinguish between the two. In contrast, in the high-attentional load condition, target and distractor faces were relatively near to each other, and so distractor faces would sometimes become confused with target faces and therefore demand a much greater allocation of attentional resources in order to distinguish between target and distractor faces.

Each of the eight conditions included 20 trials, and all 160 trials were presented in one block in random order. The experiment began with eight practice trials followed by 160 test trials, and took about 60 minutes to complete.

Data analysis

Since we were primarily concerned with whether or not participants could successfully track the target faces, only accuracy was assessed, as response latencies were not very informative about performance.

Location and identity tracking accuracies were included in the present analysis. Location tracking accuracy refers to participants’ ability to differentiate between tracked targets and untracked distractors, whereas identity tracking accuracy refers to their ability to know which target is where. Imagine that a participant was tracking four faces, whose names were Tom, Jack, Tony and Alex. If the participant was required to click on Tom, and instead clicked on Jack, this would be considered a hit with respect to location tracking accuracy, because it indicated that the participant knew it was a target, but as a miss with respect to identity tracking accuracy, because the participant did not know which target it was. According to the above definition, identity tracking accuracy can never exceed location tracking accuracy (Allen & Gabbert, 2013; Cohen et al., 2011; Horowitz et al., 2007; Pinto, Howe, Cohen, & Horowitz, 2010). Accuracy was computed by measuring the percentage of faces that were correctly located or identified.


Location and identity tracking accuracies are plotted in Figure 3. An alpha level of .05 was used for all statistical analyses in this study. A four-way within-subjects repeated measures ANOVA was performed, with the expression of target faces, the expression of distractor faces, the attentional load and the task (location versus identity) as factors.

Figure 3 

Mean tracking accuracy in low- and high-attentional load conditions as a function of target and distractor facial expressions. (A) Results for location tracking. (B) Results for identity tracking. Error bars indicated ±1 standard errors.

Results indicated the presence of a significant main effect of task so that location accuracy was significantly greater than identity accuracy, F (1, 18) = 94.473, p < .001, ŋp2 = 0.840. There was also a significant main effect of attentional load so that accuracy was higher when the attentional load was lower, F (1, 18) = 124.576, p < .001, ŋp2 = 0.874. This finding is consistent with those of previous studies (Bae & Flombaum, 2012; Franconeri, Jonathan, & Scimeca, 2010; Franconeri, Lin, Enns, Pylyshyn, & Fisher, 2008; Iordanescu, Grabowecky, & Suzuki, 2009; Makovski & Jiang, 2009b; Shim, Alvarez, & Jiang, 2008). The main effect of target facial expressions was also significant so that accuracy was higher for fearful, rather than neutral target faces, F (1, 18) = 17.152, p = .001, ŋp2 = 0.488. However, no significant main effect was found for distractor facial expressions, F (1, 18) = 0.039, p = .845, ŋp2 = 0.002. Most importantly, there was a significant interaction between attention al load and expression of target faces, F (1, 18) = 9.837, p = .006, ŋp2 = 0.353. A simple effect test confirmed that the accuracy difference between fearful and neutral target faces was only significant in the low-attentional load condition, F (1, 18) = 21.938, p < .001, ŋp2 = 0.549, and was eliminated in the high-attentional load condition, F (1, 18) = 1.212, p = .285, ŋp2 = 0.063. None of the other two-way, three-way or four-way interactions was significant, ps > .05.


The present research manipulated the expressions of both target and distractor faces and attentional load in order to explore the effect of fearful expressions on multiple face tracking. Results indicated that target facial expressions influenced tracking performance within the low-attentional load condition so that both location and identity tracking performance improved when the target wore a fearful expression, as opposed to a neutral expression. No difference in tracking performance was found between fearful and neutral target faces in the high-attentional load condition. In contrast, the emotion displayed by distractor faces did not influence tracking performance.

Prior studies have disputed whether or not the processing of emotional faces is mediated by attention. Some researchers suggest that facial expressions are processed automatically and can occur on an unconscious level (Anderson et al., 2003; Dolan & Vuilleumier, 2003; Esteves et al., 1994; Öhman, 2002; Vuilleumier et al., 2001). The most direct evidence supporting this view of automaticity comes from a functional magnetic resonance imaging (fMRI) study in which neural responses to emotional faces across different attention conditions were compared (Vuilleumier et al., 2001). The results of this study revealed that the presence of a significant amygdala response to fearful faces relative to neutral faces, regardless of whether faces were presented inside or outside of the focus of attention. Thus, Vuilleumier et al. (2001) proposed that the processing of fearful faces acts independently from attentional modulation. This postulation is further supported by a similar paradigm created by Anderson et al. (2003), in which participants were asked to pay attention to either houses or faces presented in a single overlapping display. Other researchers propose that the processing of facial expressions is a controlled procedure mediated by available attentional resources (Eimer, Holmes, & McGlone, 2003; Holmes, Vuilleumier, & Eimer, 2003; Pessoa, McKenna, Gutierrez, & Ungerleider, 2002; Pessoa, Padmala, & Morland, 2005). Using fMRI, Pessoa and colleagues (2002) found that all brain regions responding differentially to emotional faces relative to neutral faces, including the amygdala, were activated only when there were sufficient attentional resources to process those faces. When a competing task exhausted all attentional resources, differential responses to emotional faces were shown to vanish. A similar finding was also shown in an event-related potential (ERP) study (Holmes et al., 2003). Furthermore, Pessoa et al. (2005) manipulated attentional load by varying the difficulty of the competing task and found that the presentation of fearful faces without attention resulted in a stronger amygdala response than neutral faces only in the low-attentional demand condition, but not in medium- or high-attentional demand conditions. The authors therefore argue that the processing of task-irrelevant facial expressions depends on the extent to which the processing of task-relevant information leaves redundant resource capacity.

The present findings support the controlled processing view. In the current study, because participants were not required to process facial expressions, fearful faces were not related to the tracking task. However, fearful target faces were found to have a non-ignorable impact on tracking performance, despite the fact that beneficial effects only occurred when the attentional load was low. Specifically, under the low-attentional load condition, target and distractor faces were relatively far away from each other and so fewer attentional resources were needed in order to distinguish one from the other. The remaining attentional resources were therefore able to prioritize the processing of fearful faces and thus strengthened the binding between identity and location, leading to a better tracking performance. Under the high-attentional load condition, target and distractor faces were relatively near to each other and so, most probably, distractor faces may have been mistaken for target faces. In order to maintain successful target tracking, all attentional resources were used to distinguish between target and distractor faces, thus leaving insufficient resources to process facial expressions. This explanation satisfies the finding that there was an absence of an expression effect on tracking under the high-attentional load condition.

Consistent with previous findings, the present research also shows that the targets are able to achieve content-addressable representations during the tracking process (Horowitz et al., 2007; Huang, Zhang, & Zhang, 2014; Liu et al, 2012; Liu et al., 2009; Makovski & Jiang, 2009a; Makovski & Jiang, 2009b; Oksama & Hyönä, 2004, 2008). However, these findings act to discredit the visual index theory proposed by Pylyshyn (1989, 2001, 2007), as this theory argues that visual tracking is controlled by low-level visual mechanisms. Under this conceptualization, the feature information of objects is therefore ignored and inaccessible from higher-level cognition, as visual indexes alone are thought to help realize tracking. However, if target facial expressions are not processed, tracking performance should not be influenced. This was not the case in this study. In contrast to visual index theory, more and more studies have found that high-level cognitive processes can penetrate and influence low-level visual perception (Allen & Gabbert, 2013; de-Wit, Lefevre, Kentridge, Rees, & Saygin, 2011; Liu & Chen, 2012; Oksama & Hyönä, 2008; Ren et al., 2009). The ecological meaning of stimuli, which require high-level representation and include factors such as identity, familiarity, attractiveness and the social label attached to faces, have an effect on tracking performance. The present results further support this argument by suggesting that facial expressions also influence tracking performance. Fearful facial expressions capture individuals’ attention more efficiently and help individuals successfully maintain their targets. Meanwhile, it is also possible that the observed advantage in the tracking of fearful faces is attributable to the low-level visual properties (e.g., luminance, contrast, orientation and spatial frequency) afforded to fearful faces, as opposed to their emotional content per se (Frischen, Eastwood, & Smilek, 2008; Gray, Adams, Hedger, Newton, & Garner, 2013; Purcell, Stewart, & Skov, 1996; Yang & Blake, 2012). Although we carefully scaled all our stimuli to the same size, luminance and contrast, we still cannot rule out the contribution of low-level image differences (at the level of orientation and spatial frequency) between fearful and neutral faces. To clarify this issue, we conducted a control experiment (see Appendix for details of method and results) in which inverted faces were employed as tracking stimuli, while all other parameters remained the same. Under these conditions, fearful faces did not enjoy any tracking advantage over neutral faces, thus indicating that facial expressions influence visual tracking at a high computational level.

In order to achieve successful identity tracking, the visual system must recognize and maintain the representations of multiple face identities. As tracked targets move, these representations should be continuously bound to the corresponding locations (Pinto et al., 2010). Previous research has shown visual working memory is responsible for retaining identity information (Makovski & Jiang, 2009b) and updating identity-location bindings (Oksama & Hyönä, 2008), indicating that visual working memory is closely associated with visual tracking (Allen, Mcgeorge, Pearson, & Milne, 2006; Drew & Vogel, 2008; Fougnie & Marois, 2006, 2009; Howe, Horowitz, Morocz, Wolfe, & Livingstone, 2009; Trick, Mutreja, & Hunt, 2012). Two possibilities may account for the effect of fearful expressions on multiple face tracking in visual working memory. First, the identities of fearful faces may be more effectively retained than those of neutral faces by visual working memory. Indeed, there is a visual working memory maintenance advantage for faces with negative expressions (i.e. angry and fearful faces) relative to those with neutral expressions (Bankó, Gál, & Vidnyánszky, 2009; Jackson, Wolf, Johnston, Raymond, & Linden, 2008; Jackson, Wu, Linden, & Raymond, 2009; Langeslag, Morgan, Jackson, Linden, & Van Strien, 2009; Sessa, Luria, Gotler, Jolicœur, & Dell’Acqua, 2011), and fearful faces may be maintained using a higher resolution as compared with neutral faces (Sessa et al., 2011). Second, fearful faces may be more easily bound to their locations than neutral faces. Although no direct evidence has been provided to show that fearful faces are able to improve identity-location binding, several studies have reported that the spatial locations of negative emotional stimuli (e.g., words and pictures) are better remembered than neutral ones, which does indicate a better binding between the two (D’Argembeau & Van der Linden, 2004; Mather & Nesmith, 2008). Taken together, fearful target faces may facilitate identity tracking through the enhancement of maintenance of target identities and identity-location binding in visual working memory. The present finding that location tracking improved in addition to identity tracking is also consistent with the common resource model of MIT (Cohen et al., 2011; Pinto et al., 2010). Cohen et al. (2011) suggested that tracking the objects’ locations cannot be accomplished independently from tracking their identities; rather, they draw on a common resource that can be flexibly distributed between identity tracking and location tracking. Fearful expressions make identity-location binding easier, and therefore result in freeing up more mental resources that can be devoted to track the objects’ locations.

The present research demonstrates that facial expressions of distractors are not able to influence tracking performance. This may be related to a degree of distractor inhibition during the tracking process. Pylyshyn and colleagues (Flombaum, Scholl, & Pylyshyn, 2008; Pylyshyn, 2006; Pylyshyn, Haladjian, King, & Reilly, 2008) employed a probe detection task to explore the distribution of attention on targets, distractors and background during the tracking process and found that the detection rate was higher when the probe point emerged on either the targets or background, than when it emerged on the distractors. This suggests that distractors are inhibited during tracking. ERP data (Doran & Hoffman, 2010) has also demonstrated the existence of such a mechanism. Using faces with different identities as tracking stimuli, Ren et al. (2009) found that the identities of distractor faces are not processed. Likewise, Liu and Chen (2012) also found that the degree of attractiveness of distractor faces does not influence multiple face tracking performance. Collectively, these findings indicated that targets are processed more deeply than distractors during tracking, and that there is an inhibition effect of attention on distractors that inhibits the processing of distractor facial expressions.

In summary, the current research shows that facial expressions with abundant task-relevant social information can influence visual tracking performance. The advantage of tracking fearful target faces is only present when sufficient attentional resources are available, which suggests that the effect of fearful expressions on multiple face tracking is mediated by the availability of attentional resources. The finding that visual tracking is sensitive to the expression of target faces contrasts with visual index theory, which posits that tracking is achieved in early vision and completely encapsulated from high-level cognitive processes. Fearful faces may be maintained more efficiently in visual working memory and bound more easily to their locations than neutral ones. However, given that the present research only chose fearful faces as tracking stimuli, future studies should adopt other facial expressions (e.g., anger and happiness) in order to further investigate this issue.

Competing Interests

The authors declare that they have no competing interests.