1. Introduction

Important advancements have been made in recent years in the domains of familiar people recognition through different perceptual (face and voice) and verbal (personal name) channels. These advances, that concern both the neural mechanisms underlying people recognition and the clinical phenomena resulting from disruption of these mechanisms, will be taken into account in the present review. In particular, attention will be focused on the perceptual channels and the brain mechanisms allowing recognition through faces and voices, because these social stimuli transmit critical information about identity, gender, age and emotional status of known and unknown people, conveying the most important social information, respectively in the visual and auditory domains. Clinical and functional neuroimaging investigations have shown: (a) that many similarities exist between the cognitive and neural processing mechanisms engaged by perceiving faces or voices; (b) that highly specific brain mechanisms underlie the identification of familiar people through faces and voices and (c) that these mechanisms mainly rely upon right hemisphere structures. Other investigations, however, have also suggested that faces play a more important role with respect to voices in familiar people recognition. This critical role of face as the most important channel used to recognize familiar people is reflected both in clinical data and in theoretical models advanced to explain these data.

Bodamer (1947) was the first to describe a specific form of visual agnosia, selectively affecting face recognition and to propose the term ‘prosopagnosia’ (Greek: “prosopon” = “face”, “agnosia” = “not knowing”), to denote a modality-specific form of familiar people recognition disorder. This term has been usually applied to individuals who had lost the ability to recognize faces, following acquired brain damage, even if in the past few years, there has been recognition of an impairment in face processing analogous to acquired prosopagnosia, which occurs in the absence of brain damage (see Susilo & Duchaine, 2013 for review). This disorder has been termed ‘congenital prosopagnosia. From the neuroanatomical point of view, the Kanwisher, McDermott & Chun (1997) seminal paper, argued for the existence in the Fusiform Gyrus of a processing module, termed the Face Fusiform Area (FFA). More recent studies (e.g. Gauthier, Tarr, Moylan, Skudlarski, Gore & Anderson, 2000; Haxby, Hoffman & Gobbini, 2000; Rossion, Hanseeuw & Dricot, 2012), have however, suggested that prosopagnosia may be due to lesion of a larger network, that includes, in addition to the FFA, the Occipital Face Area (OFA, Gauthier et al., 2000) the Superior Temporal Sulcus (STS) (Puce, Allison, Bentin, Gore & McCarthy, 1998), and their interconnections (e.g. Fox, Iaria & Barton, 2008). The role of these different regions is controversial. Thus, a hierarchy has been proposed between the Occipital cortices, which show sensitivity to physical properties of a face (Pitcher, Walsh & Duchaine, 2011), and the FFA that could be involved in perceiving identity (Fox, Moon, Iaria & Barton, 2009). However, Rossion, Caldara, Seghier, Schuller, Lazeyras & Mayer (2003) and Rossion (2008), have proposed a reverse hierarchical model of face perception, which assumes that re-entrant, interactions may also exist between FFA and lower order (OFA) visual areas.

If we pass from prosopagnosia to voice recognition disorders, these defects have also been reported, starting from the paper of Van Lanker & Canter (1982) who labelled this disturbance ‘Phonagnosia’. This disorder, however, has been described only in a very small number of patients, usually in subjects with bilateral or right temporal lesions and has often been investigated with rather poor methodology. For these reasons, the study of prosopagnosia has represented for many years the most important and almost exclusive domain of research used to study defective recognition of familiar people.

Just as the clinical data, the first cognitive model that has tried to analyse the functional architecture of processes allowing the recognition of familiar people is the Bruce & Young’s (1986) face recognition model, which has identified the sequential stages through which the treatment of visual information, proceeds up to the level of a three-dimensional structural description, which includes a complete perceptual specification of faces, and is matched with stored modality-specific “face recognition units” (‘FRUs’), allowing access to the person-specific semantic system.

2. Prosopagnosia and multimodal defects of familiar people recognition

If we follow the historical evolution of models of familiar people recognition, starting from the construct of prosopagnosia, we must underline: (a) that prosopagnosia, is a recognition disorder restricted to the visual modality (a consequence of this being that individuals who are not recognized through their face can be recognized through their voice or non physiognomic features, such as moustaches or hairstyling, or by hearing their names); (b) that a distinction has been proposed by De Renzi (1986) and De Renzi, Faglioni, Grossi & Nichelli (1991) between ‘apperceptive’ and ‘associative’ forms of prosopagnosia. The apperceptive form consists of a defect non only in the recognition of familiar faces, but also in the discrimination of unfamiliar faces and of non person-specific information (such as age, gender and emotional expression). It has been ascribed to a high-level visual defect. The associative form consists of a specific defect in the recognition of familiar faces, in the absence of problems in the discrimination of unfamiliar faces and of other visual disorders. It has been attributed to a memory or associative disturbance. A misleading consequence of the ‘dominance’ of faces in familiar people recognition has been that it has somehow led to neglect the description by Ellis, Young & Critchley (1989) and Hanley, Young & Pearson (1989) of patients with anterior temporal lobe (ATL), mainly right-sided lesions, who showed a multimodal defect in famous people identification.

In these patients a more or less important inability to recognize familiar people is observed through face and voice and (to a lesser extent) through personal name. These recognition disorders are considered as due to disruption of supramodal entities, the person-identity nodes (PINs), where information coming from the Face Recognition Units (FRUs), the Voice Recognition Units (VRUs) ant the Name Recognition Units (NRUs) converge and are identified as belonging to the same person. From the anatomical point of view, these observations of Ellis et al. (1989) and of Hanley et al. (1989) suggested that the convergence in the PINs of person-specific information takes place in the rostral parts of the temporal lobes. Furthermore, since familiar people recognition disorders tended to be greater from face and voice than from personal name and the lesion prevailed on the right side, they suggested either that the right ATL plays a greater role in familiar people recognition or that different patterns of people recognition disorders can be observed in patients with right and left ATL lesions. Leaving, for the moment, apart this hypothesis, that will be discussed later, I would stress the diagnostic bias provoked by the dominance of prosopagnosia over other people recognition disorders. As a matter of fact, subjects with multimodal people recognition disorders have often been considered as prosopagnosic patients.

To underline the frequency with which this diagnostic confusion has been made, I will quote: (1) two classical anatomo-clinical observations made by Bouduresque, Poncet, Cherif & Balzamo (1979) and by De Renzi (1986); (2) a more recent group study (Josephs, Whitwell, Vemuri, Senjem, Boeve, Knopman et al., 2008) on the anatomical bases of prosopagnosia in patients affected by semantic dementia (SD) and a recent investigation by Fernaeus, Ostberg, Wahlund & Bogdanovic (2012) on ‘paradoxical prosopagnosia in semantic dementia’.

(1) In the Bouduresque et al.’s (1979) observation the authors reported as prosopagnosia (‘Agnosie des visages’) a patient who, after a Herpes Simplex Encephalitis (HSE), complained of an ‘important difficulty to recognize familiar faces, in the absence of intellectual, memory, linguistic and visual disorders (including a test of unfamiliar faces discrimination). The authors noticed that even if the patient insisted being able to identify from voice family members, at a systematic study of familiar voices recognition her performances were very poor. The subjective claim of a good recognition of familiar voices must, therefore, be substantiated by objective data. The lesion mainly affected the anterior parts of the right temporal lobe. The clinical status was very similar in a patient reported by de De Renzi (1986), who had also been affected by HSE, because, apart an important defect of semantic memory, his intellectual, linguistic and visual capacities (including a test of unfamiliar faces discrimination) were normal. The patient had, on the contrary, selectively lost the face familiarity feelings and the capacity to recognize friends or family members. He could, however, not recognize familiar people by voice, and was, therefore, unable to compensate through other modalities his face recognition disorders. For this reason, De Renzi wondered if this patient was really a case of prosopagnosia. In this case too, the lesion affected the anterior parts of the temporal lobes.

(2) Starting from the description of cases of ‘progressive prosopagnosia’ in patients affected by Semantic Dementia/SD (e.g. Evans, Heggs, Antoun & Hodges, 1995; Joubert, Felician, Barbeau, Sontheimer, Guedj, Ceccaldi et al., 2003), Josephs et al (2008) have studied the anatomical correlates of ‘prosopagnosia’ in SD. These authors have identified with voxel-based morphometry the patterns of cortical atrophy in SD patients with and without prosopagnosia. They have shown that in SD the atrophy mainly affects the anterior parts of the temporal lobes, but in patients with ‘prosopagnosia’ the atrophy prevails in the antero-mesial parts of the right temporal lobe, whereas in those without prosopagnosia the atrophy prevails in the anterior parts of the left temporal lobe. As in the case of Bouduresque et al. (1979) and of De Renzi (1986), even Josephs et al. (2008) were conscious of having used the term ‘prosopagnosia’ in an inappropriate manner, because they acknowledged that, in contrast with the definition of ‘prosopagnosia’ (disorder of face recognition, specific for the visual modality), in SD the difficulty of recognizing a known people is found both through his face and voice or even through his name. Less interesting from our vantage point is the paper by Fernaeus et al. (2012), because this paper took mainly into account SD patients with left temporal atrophy and found no face recognition disorders in these patients, confirming that defects in famous people recognition are mainly observed in patients with right temporal lobe atrophy.

Some attempts have been made to bring back recognition disorders of patients with right anterior temporal (ATL) lesions within the framework of prosopagnosia. For instance, Nakachi, Muramatsu, Kato, Akiyama, Saito, Yoshino et al. (2007) and Williams, Savage & Halmagyl (2006) have described patients affected by right ATL atrophy who showed selective disorders in the recognition of familiar faces and have considered them as instances of prosopagnosia. In particular, Williams et al. (2006) claimed that their patient BD had a difficulty in the holistic, configural treatment of faces, considered as the basic face processing defect of patients with prosopagnosia (e.g. Levine & Calvanio, 1989; Delvenne, Seron, Coyette & Rossion, 2004). More recently, however, Busigny, Robaye & Rossion (2009) have analysed, in a very well controlled and detailed study, a similar patient (MD), also affected by a right anterior temporal atrophy, showing that: (a) no defect could be detected in the configural treatment of faces; (b) a multi-modal deficit could be found in the recognition of familiar people.

3. A prototypical case of multi-modal defects in familiar people recognition

My attention to the problem of multimodal defects of familiar people recognition was prompted by the observation of a patient (CO) with a slowly progressive multimodal defect of familiar people recognition (Gainotti, Barbier & Marra, 2003). CO was a 49 years old man, married with 2 children, working in a bank, who since one year had difficulties recognizing friends of his children, then their parents and, finally, his officemates. He claimed being able to recognize by voice familiar people, but his wife noticed that he was unable to identify people who called him by phone. At the neuropsychological examination he obtained above average results on tasks of episodic memory (RAVLT), language (BADA and Snodgrass), executive functions (Stroop), visual and visuo-spatial perception (VOSP). Results obtained by CO on tasks of face/people recognition can be summarized as follows: - very good discrimination of unknown faces (Face Matching test and Age evaluation test); - loss of familiarity feelings for known faces (Familiarity check test); - important defects in the identification of photographs of famous people; - ratio between scores obtained on ‘apperceptive’ and ‘associative’ tests clearly indicative of an ‘associative form of prosopagnosia’. Naming of celebrities was very impaired from photos, but almost normal from verbal definitions. This fact suggested a diagnosis of associative prosopagnosia, but at a formal testing he showed difficulties of voice recognition quite similar to those met in face recognition and this fact shifted the diagnosis from ‘associative prosopagnosia’ to ‘multimodal familiar people recognition disorders’.

From the neuroanatomical point of view, CO showed both at the MRI and at the SPECT a clear atrophy of the antero-inferior portions of the temporal lobes, more severe on the right side. These anatomical data were consistent not only with the lesion location of patients reported by Bouduresque et al. (1979), De Renzi (1986), Evans et al. (1995), Joubert et al. (2003) Josephs et al. (2008) and Busigny et al. (2009), but also with data obtained by Snowden, Thompson & Neary (2004) studying the knowledge of famous faces and names in SD patients showing a right or left prevalence of temporal lobe atrophy. Taken together, these data prompted us to undertake a systematic review of the literature dealing with poor recognition of celebrities, in patients with right and left anterior temporal lesions.

4. A review of the patterns of familiar people recognition disorders observed in patients with right and left ATL lesions

This review (Gainotti, 2007) has been conducted making reference, on one hand to the cognitive model of face recognition, proposed by Bruce & Young (1986), and, on the other hand, to the Interaction Activation and Competition (IAC) model of people recognition proposed by Burton, Bruce & Johnston (1990) and developed by Bredart, Valentine, Caldor & Gassi (1995), Valentine, Brennen & Bredart (1996) and Burton, Bruce & Hancock (1999), because these models make different predictions with respect to the locus of generation of familiarity feelings and to the module where personal semantic knowledge is stored. In the Bruce & Young (1986), model (a) familiarity feelings are generated in the Recognition Units and (b) personal semantic is stored in PINs, considered as specific semantic archives, whereas in the IAC model, familiarity feelings are linked to the PINs, considered as a modality-free gateway, allowing access to a unitary semantic system, where person-specific semantic information is stored in an abstract and amodal format.

In this review I have taken into account 6 group studies and 35 single-case studies of patients, affected by right (N=19) or left (N=16) ATL lesions, who showed: (a) a selective impairment in recognition, identification or naming of celebrities; (b) a lesion involving the anterior parts of the right or left temporal lobes. Whenever this was possible, I have analyzed, in the single-case studies: (A) the familiarity judgements concerning face and name; (B) the loss of (or the difficulty to access) the personal semantic information (identification); (C) the capacity to name the identified person. Results of the review have shown: (A) that familiarity feelings are much more affected by right than by left temporal lesions and that in right ATL lesions the loss of familiarity feelings is specific for the visual modality, concerning much more the face than the name, whereas the asymmetry between name and face is much less clear in patients with left ATL lesions.

These results suggest (in agreement with the Bruce & Young’s model) that familiarity feelings are not generated at the PIN’s level, but at the level of the FRUs (that might be more represented in the right hemisphere because of the dominance of this hemisphere in face processing). (B) With respect to person-specific semantic information, results of the review have also shown that in patients affected by right ATL lesions the difficulty to find person-specific semantic information is greater from faces than from names. This difficulty also concerned subjects with right ATL lesions who, showing normal familiarity feelings, should not have disturbances at the PINs level. This fact is at variance with models (such as the IAC), viewing PINs as simple gateways, allowing access to a unitary semantic system where personal information should be stored in an abstract and amodal format, because, if PINs are intact, access to the semantic information should be spared for each modality.

(C) Finally, the capacity to name familiar people was more impaired in patients with left ATL lesions.

5. Can Right ATL lesions cause Associative Prosopagnosia in addition to Multimodal People Recognition Disorders?

The survey of the literature concerning people recognition disorders in patients with right and left ATL lesions, showing that right ATL lesions usually cause a multimodal people recognition disorder, affecting voices (and to a lesser extent names) in addition to faces, raised some problems to the construct of Associative Prosopagnosia. According to Barton et al. (Pancaroglu, Busigny, Johnston, Sekunova, Duchaine & Barton, 2011; Davies-Thompson, Pancaroglu & Barton, 2014), apperceptive prosopagnosia could be due to disruption of the right FFA, whereas associative prosopagnosia could result from anterior temporal lesions, and might be due either to a disconnection mechanism (Fox et al., 2008), or to a loss of facial memories. I have recently questioned these claims (Gainotti 2013), surveying all cases of patients reported as associative prosopagnosia, to see if their defect was circumscribed to the visual modality or also affected other channels of people recognition. Two groups of patients could be identified: (a) the first consisted of 7 patients, belonging to a relatively ‘old’ neuropsychological literature, who had been considered as instances of associative prosopagnosia by De Renzi (1986) in his seminal review of this subject and (b) the second consisted of 16 additional patients, reported in more recent years and investigated in more detail, who satisfied the criteria of associative prosopagnosia (defects in recognition of familiar people, with scores in the normal range on face matching tests or high level visuo-perceptual tests) and that have been classified under this heading by the respective authors.

The review showed that in most reported patients the study had been limited to the visual modality but, when the other modalities of people recognition had also been taken into account, the defect was often multimodal, affecting voice (and to a lesser extent name) in addition to face. To understand if patients with right ATL lesions can have either an associative form of prosopagnosia or a multimodal people recognition disorder, it is, therefore, very important to verify with formal tests if these patients are or are not able to recognize others by voice, because we have seen that these patients are often unaware of their voice recognition disorders. Furthermore, it is necessary to check if the lesion location is different in patients with pure forms of associative prosopagnosia and in those with multimodal people recognition disorders.

6. A Review of voice recognition in brain-damaged patients and normal subjects

Voice recognition disorders have been studied much less than face recognition disorders, despite their clinical and theoretical importance. In a further review (Gainotti, 2011), I have therefore compared recognition of familiar faces and voices, taking into account: (a) results obtained in individual patients with right anterior temporal lesions, (b) results of group studies of unselected right- and left brain-damaged patients and (c) results of experimental investigations conducted on face and voice recognition in normal subjects. Results of the review showed that: (1) normal subjects have greater difficulty evaluating familiarity and drawing person-specific semantic information from the voice than from the face of celebrities (see also Brédart & Barsics, 2012); (2) voice recognition disorders are mainly due to right temporal lesions, just as face recognition disorders; (3) familiar voice recognition disorders can be dissociated from unfamiliar voice discrimination impairments; (4) although face and voice recognition disorders tend to co-occur, they can also dissociate; (5) in these patients there could be a prevalent involvement of the right FFA when face recognition disorders are on the foreground, and of the right STG when voice recognition disorders are prominent. This claim, however, must be considered with caution and should be confirmed by further studies. The only patient classified as ‘Progressive associative phonagnosia’ has, indeed, been reported by Hailstone, Crutch, Vestergaard, Patterson & Warren (2010) in a case of behavioral variant of Fronto-Temporal Dementia (bvFTD) who showed a lesion spanning from the temporal pole to the superior temporal sulcus. Furthermore, Hailstone, Ridgway, Bartlett, Goll, Buckley, Crutch et al. (2011), studying in SD and in Alzheimer’ disease voice processing, and (with voxel-based morphometry) the correlative neuroanatomical data, showed that in both disease groups recognition measures for voice, face and name processing were associated with grey matter volume in the right temporal pole and anterior fusiform gyrus. The anatomical correlate of voice recognition disorders is, therefore, different in the single case of ‘Progressive associative phonagnosia’ reported by Hailstone et al. (2010) and in the group study performed by Hailstone et al. (2011). This observation raises a problem analogous to the one met in the previous section of this review with respect to face recognition disorders, namely the existence in the right ATL of a structure whose lesion causes a pure form of ‘Progressive associative phonagnosia’, and that can be distinguished by lesions causing a multimodal people recognition disorder.

7. Theoretical models subsuming the constructs of ‘associative prosopagnosia’ and ‘associative phonagnosia’

The constructs of ‘associative prosopagnosia’ and of ‘associative phonagnosia’ are related to modular models, assuming that face and voice are independently processed up to the level of their ‘structural descriptions’ and that voice and face processing systems cannot communicate before the level of the corresponding PINs. Recent data, by von Kriegstein et al. (von Kriegstein, Kleinschmidt, Sterzer & Giraud, 2005; von Kriegstein & Giraud, 2006; Blank, Anwander & von Kriegstein, 2011) have, however, questioned the modular nature of these channels of person recognition, showing that a cross-communication between channels of person recognition probably exists before the level of the PINs. These authors measured, by means of fMRI, brain activity during voice identification tasks in which subjects focused on either the speaker’s voice or the verbal content of sentences. They showed that familiar persons’ voices activated the FFA when the identification task was to focus on the speaker’s identity. These and other data (e.g. Schweinberger, Robertson & Kaufmann, 2007; Schweinberger, Kloth & Robertson, 2011; Föcker, Holig, Best & Röder, 2011) suggest that the assessment of person familiarity can result in direct information sharing between voice and face sensory channels from the early processing stages, before access to the person identity nodes. Schweinberger, Herholz, & Stief (1997) and O’Mahony & Newell (2012) have, however, shown that an interaction similar to that found between faces and voices is not observed between faces and names. These results suggest that the link between face and voice is closer than that between faces/voices on one hand and names on the other hand. The reason could be that the right hemisphere channels which process perceptual data are more closely integrated than the right and left hemisphere channels processing respectively perceptual and verbal data.

8. General implications of results of our reviews and main models advanced to explain the critical role of the ATLs in people recognition

If I try to summarize results reported in this review, I can say that: (1) the right ATL plays a greater role in the generation of face familiarity feelings and in reaching person-specific semantic information from faces and voices; (2) PINs are not a simple gateways allowing access to a unitary semantic system, but play a critical role in storing and retrieval person-specific semantic information; (3) the left ATL is critically involved in personal name retrieval. All these results are at variance with the IAC model and suggest that people representations based on a convergence of perceptual information may be subsumed by the right hemisphere, whereas representations based on language mediated information might be mainly subsumed by the left hemisphere. Two main models have been advanced to explain the critical role of the ATLs (and in particular the right ATL), in people recognition. The first model (e.g. Zahn, Moll, Krueger, Huey, Garrido & Grafman, 2007; Zahn, Moll, Iyengar, Huey, Tierney, Krueger et al. 2009) assumes that the right and left ATLs may play a leading role in social cognition. Disorders of face and voice recognition met in patients with right ATL lesions and defects in naming familiar people observed in patients with left ATL lesions should be considered (according to this model) as part of this social cognition defect. The second model (Gainotti, 2007 and 2013; Snowden et al., 2004 and Snowden, Thompson & Neary, 2012) assumes, on the other hand, that: (a) the loss of familiarity feelings and the inability to access person-specific semantic information from face and voice reflects the leading role of the right ATL in the construction of representations based on perceptual material, whereas (b) disorders in retrieving familiar names may be due to the crucial role of the left hemisphere in representations mainly based on verbally-coded information. Two kinds of data support the second model. The first is the observation that the loss of person-specific information is greater in patients with right than in those with left ATL lesions, irrespectively of the perceptual or verbal modality used to access this information (Gainotti, 2007). The second is the observation of Snowden et al. (2004 and 2012) that semantic dementia patients with predominantly right temporal lobe atrophy perform worse on the picture than on the word version of the semantic memory ‘Pyramids and Palm Trees’ test, whereas the opposite result is obtained in patients with predominantly left temporal lobe atrophy. These data suggest that the different format of right and left ATL representations is not limited to familiar people, but also extends to other conceptual domains. So, even if further investigations are certainly needed to choose between these alternative models, the hypothesis assuming that people representations based on a convergence of perceptual information may be subsumed by the right hemisphere, whereas representations based on language mediated information might be mainly subsumed by the left hemisphere seems to be more strongly supported by empirical data.