In social interaction, age is one of the most important factors (Gladwell, 2006). Indeed, the age of our interlocutor strongly determines the way we interact with her or him. In almost all societies, people speak to younger persons and to older persons differently. The lexical and syntax of messages is likely to change radically according to the relative age of the interlocutor (e.g., Kemper, Ferrell, Harden, Finter-Urczyk, & Billington, 1998; Ryan, Giles, Bartolucci, & Henwood, 1986). For instance, some studies showed that young people tend to speak more slowly and loudly to older people (e.g., Hummert, Shaner, Garstka, & Henry, 2006). Therefore, in everyday life, it is common to estimate the age of people in order to respond in a befitting or adjusted manner. This process occurs in just a few seconds and simply by looking at the person and listening to the voice (Gladwell, 2006). Besides everyday life, the ability to estimate the age is useful in more specific contexts such as police testimony or sale of products authorized only from a certain age. Therefore, understanding human capacity for age estimation and variables that influence it has been the object of numerous studies.

Most of studies are focused on age estimation either from faces or from voices exclusively. Indeed, faces can provide a large variety of information such as the ethnic group, the gender, the emotional state and the age (Bruce & Young, 1986). In the same way, independently of the content of speech, voices can provide a considerable amount of information on speaker’s identity, gender, height, weight or age (Hughes & Gallup, 2008). Voices are often considered as the auditory counterparts of faces (Belin, Fecteau, & Bédard, 2004). Physical changes that occur with ageing affect both face and voice and therefore the apparent age. From childhood to adulthood, there are changes in head shape, in nose shape, in jaw bones, and in forehead. The face becomes more elongated with time (see Rhodes, 2009 for a review). The end of growth occurs at approximately 20 years of age and later wrinkling and creasing start, skin color changes, and hair becomes grey (Burt & Perrett, 1995). Regarding the voice, with age, there are changes in phonatory and articulatory systems. These changes affect the oral cavity, pulmonary function, and laryngeal function (Ramig & Ringel, 1983; Ryan & Burk, 1974). A number of hormonal changes also influence the sound of the voice. For instance, at the puberty the transition from a child-sounding voice to a mature, adult, voice is due to a hormonal surge (Hughes & Rhodes, 2010). Given the myriad of changes that age generates on face and voice, numerous cues are indicative of age and can be used to estimate the age. Moreover, although, in the situations described above, multiple cues for estimating a person’s age are available, in other situations the only pieces at our disposal are photographs of a face or the recording of a voice. For instance, in numerous situations of testimony, the perpetrator is hooded or the aggression occurred during the night so that the witness cannot base age estimation on the face but well on the voice. In everyday life, during a phone call, the voice is the only indicator of the age. In addition, in the era of Internet and social networks, it is common to estimate the age of unknown persons from pictures.

Given that numerous cues in faces and voices are indicative of age, the main question investigated in age estimation is to know whether people are able to distinguish these cues of age and therefore estimate accurately the age from a face or a voice. The first section of this review addresses the question of the accuracy of age estimation from faces and voices.

Accuracy of age estimation

The comparison between faces and voices about the accuracy in age estimation is complicated by the difference of methods and dependent variables used in studies. In age estimation from voices, some researchers have asked to categorize voices into an age range (e.g., Cerrato, Falcone, & Paolini, 2000; Neiman & Applegate, 1990; Ptacek & Sander, 1966; Shipp & Hollien, 1969). This task is also used with faces but mainly in studies with children (Anastasi & Rhodes, 2005) or as a distractive task in studies on face recognition (Anastasi & Rhodes, 2006). Other methods of categorisation used with faces were sorting tasks and discrimination tasks. In the formers, participants were asked to rank faces from the youngest to the oldest (Pittenger & Shaw, 1975). The latters required participants to distinguish the oldest or youngest between two faces (George, Hole, & Scaife, 2000). In all these cases, the categorization of face or voice into an age range is highly accurate. Indeed, Ptacek and Sander (1966) reported that listeners were able to sort voices into two categories (under age 35 or over age 65) with a percentage of correct responses of 78% from prolonged vowels and of 99% from speech. In age estimation from faces, Anastasi and Rhodes (2006) showed that young participants sorted photographs into three age range (18–25 years, 35–45 years, 55–75 years) with a percentage of correct responses of 83.1%.

The most common task used to examine age estimation from both faces and voices is to assign a precise age to voices (e.g., Amilon, Van de Weijer & Schötz, 2007; Hartman, 1979; Huntley, Hollien & Shipp, 1987; Krauss, Freyberg & Morsella, 2002; Moyse, Beaufort, & Brédart, 2014; Schötz, 2005) or the face (e.g., Moyse, & Brédart, 2012; Sörqvist & Eriksson, 2007; Vestlund, Langeborg, Sörqvist, & Eriksson, 2009; Voelkle, Ebner, Lindenberger, & Riediger, 2012; Willner & Rowe, 2001). However, the dependent variables differed according to the studies and lead to different conclusions. The correlation between chronological age and perceived age is the measure that was mainly used in age estimation from voices (e.g., Braun & Cerrato 1999; Cerrato et al. 2000; Hartman, 1979; Huntley et al. 1987; Krauss et al. 2002; Neiman & Applegate, 1990; Ryan & Burk, 1974; Shipp & Hollien 1969). Such a measure is particularly well-suited to a design comprising a lifespan sample of stimuli. However, the weakness of this measure is that it does not provide information about general accuracy. Indeed, a systematic error will result in a high correlation and lead to an erroneous conclusion of good performance (Braun, 1996; Braun & Cerrato, 1999). The correlation between perceived age and chronological age was high and varied from 0.68 (Braun, 1996) to 0.88 (Shipp & Hollien, 1969) across the studies (see Ceratto et al., 2000 for a synthesis).

Nowadays, two complementary dependent measures are mainly used in age estimation from faces, namely signed value and absolute value (e.g., Dehon & Brédart, 2001; Moyse & Brédart, 2012; Vestlund et al., 2009; Voelkle et al., 2012). The first one corresponds to the mean value of the difference between perceived age and chronological age for a given set of stimuli. The second one corresponds to the absolute value of the deviations. Signed values provide information about the direction of age estimation (under or overestimation) whereas absolute values provide information about the amplitude of age estimation errors. When signed and absolute values are considered, the age of faces was more accurately estimated than the age of voices. Indeed, globally, error of estimation from faces was around five years whereas error of estimation from voices was around ten years. For example, Voelkle et al. (2012) reported that young adults could estimate the age of faces between 19 to 80 years with an absolute error of 5.91 years. In a study on age estimation from voices (Moyse et al., 2014), participants estimated the age of voices belonging to two age groups (20–30 years and 65–75 years) with an absolute error of 10.8 years. In addition, Amilon et al. (2007) compared the performance of age estimation from faces and voices in the same study. Consistent with previous studies, they showed that young adults were more accurate at estimating the age based on face information (average absolute value of 5.7 years) compared with to voice information (average absolute value of 9.7 years). In addition, they also asked to estimate the age from a video in which face and voice information were available at the same time. In this case, the average absolute value was 5.1 years, which is similar to the photograph condition. Voice information did not improve the performance of age estimation from a photograph only. Therefore, when multiple cues are available, it appears that age estimation was mainly based on visual appearance.

Given that people seems able to estimate fairly accurately the ages from a face or a voice, researches are focused on cues used to estimate the age. The literature of faces has shown that local (e.g., mouth, nose, eyes) and global features (e.g., skin texture, head shape) could have an impact on age estimation (Burt & Perrett, 1995; George & Hole, 1995, 1998). Different perceptual features of voices (e.g., pitch, quality, articulation, and rate of speech) have also been shown to be strong predictors of age perceived (Harnsberger, Shrivastav, Brown, Rothman, & Hollien, 2008; Hartman, 1979). Of course, these features are specific to faces vs voices. Therefore, we will not deal with these cues in depth. Rather, the next section will focus on cues that impact age estimation from both faces and voices i.e., group characteristics (ethnic group, age and gender).

Group characteristics

Ethnic group

The occurrence of the so called « own-race » bias has been well documented in memory for faces: recognition performance for faces of one’s « own race » is higher compared to performance for faces of « another race » (see Meissner & Brigham, 2001 for a review). However, few studies investigated whether this bias also occurs in memory for voices. In fact, the influence of ethnic group on voice recognition is less obvious. Studies on voice recognition have generally failed to evidence an own-race bias (see Yarmey, 1995 for a review). The first empirical evidence for an own-race bias in voice recognition was observed in the Perrachione, Chiao, and Wong (2010) study. Indeed, White participants were better at identifying voices sounding White and Black participants were better at identifying voices sounding Black. However, a cousin bias has been clearly demonstrated with voices: the « other-accent » effect (Stevenage, Clarke, & Mc Neill, 2012). They reported that English and Scottish participants recognized better own-accent voices than other-accent voices. Moreover, accents are not only different across countries but also across regional areas.

Few interests have been shown for the influence of « race » on age estimation. In addition, results are mitigated. Dehon and Brédart (2001) used an experimental design where race of faces and of participants were crossed. Their results revealed the occurrence of an own-race bias in Caucasian participants. Their age estimation was more accurate for own-race faces than for other-race faces. However, African participants performed in the same way for Caucasian and African faces suggesting no own-race bias. The authors explained this pattern of results as a support for the contact hypothesis, the most popular explanation for the own-group biases. According to this hypothesis, people become experts at discriminating between own-group stimuli through increased contact with them (Meissner & Brigham, 2001). In Dehon and Brédart’s (2001) study, African participants had lived in Belgium, a country with a predominance of Caucasian faces, for at least five years. Therefore, they had the opportunity to increase their expertise for Caucasian faces.

Regarding the age estimation from voices, Braun and Cerrato (1999) asked German and Italian listeners to estimate the age of German and Italian voices. The language did not impact age estimation performance of the listeners. By contrast, another study by Nagao and Kewley-Port (2005) reported that English and Japanese listeners were more accurate at estimating the age of one’s own-language than the other-language, this could be consistent with the presence of an own-race bias. Although these two studies bring opposite conclusions, the choice of languages could explain this divergence. In the first study, voices came from two European countries where languages are both Indo-Europeans. The tested effect may be an own-language bias rather than an own-race bias. At the opposite, the second study compared actually two groups from different ethnicity. According to this point of view, results of these studies were not really in contradiction but revealed the absence of an own-language bias and the occurrence of an own-race bias in age estimation from voices. However, given the dearth of data, the occurrence of an own-race bias in age estimation from faces and voices has yet to be explored.

Age of stimuli

There are several observations suggesting that the effect of age may be similar for faces and voices. First, there is a tendency to place the stimuli in the middle range with the age of young stimuli being overestimated and the age of older stimuli being underestimated (e.g., for face: Vestlund et al., 2009; for voice: Cerrato et al., 2000; Ryan & Capadano, 1978; Schötz, 2005; Shipp & Hollien, 1969). This pattern of response could correspond to some sort of central tendency effect (Hollingworth, 1910).

Secondly, the performance of age estimation is influenced by the age of stimuli. Hughes and Rhodes (2010) reported that the absolute error of age estimation increased with the age of voices. In their study, the deviation from the actual age of the child stimuli (2–9 years) was on average of 1.56 years whereas for the young adults (23–34 years) and older adults (56 years and over) stimuli, the deviation was respectively 9.90 and 12.83 years. In age estimation from faces, Sörqvist and Eriksson (2007) reported the same pattern of responses with an absolute error of the young faces (15–24 years) of 2.83 years and for the older faces (56–65 years) an absolute error of 5.25 years. Globally, studies on age estimation indicated an overall decline of the accuracy as the age of the stimuli increased. In the Amilon et al. (2007) study, the correlation between chronological age and the absolute estimation errors was significantly positive, namely 0.72 when photographs were used as stimuli, and 0.68 when voices were used (these coefficients were calculated from data presented in Table 2 and 3 of Amilon et al., 2007 paper). As explained in the introduction of this paper, until adulthood, appearance of faces and voices change evidently. Therefore, a difference of five years for a « child » stimulus is more striking than for a « young adults » stimulus or for an « older adults » stimulus. Moreover, ageing is a stochastic processing and is not uniform. It depends both intrinsic and extrinsic factors (Nkengne, Bertin, Stamatas, Giron, Rossi, Issachar, & Fertil, 2008). The extrinsic factors are amongst others sun exposure, smoking, alcohol intake, workload, physical condition and they have an impact on ageing face and voice. For example, Rexbye, Petersen, Johansen, Klitkou, Jeune, and Christensen (2006) reported that sun exposure, smoking and a low body mass index have a negative influence on facial ageing. A study on the effect of smoking on age perception reported that age estimation from voices were significantly higher for smokers than for non-smokers of the same age. In fact, smoking causes histological changes in the vocal apparatus and therefore, affects age perception (Braun and Rietveld, 1995). Given that these environmental factors are strongly variable among individuals, it could explain the higher error and variability of age estimation with ageing of stimuli.

Age of participants

The age of participants influences also age estimation from both faces and voices. Indeed, Moyse et al. (2014) asked to young (20–30 years) and older (65–75 years) participants to estimate the age of voices belonging to two age groups (20–30 years and 65–75 years). They reported that absolute error scores were higher in older than in young participants, respectively 11.37 and 10.14. In the same way, with an absolute error of 6.83 years, older participants estimated less accurately the age from faces than middle-aged (6.30 years) or young participants (5.91 years) (Voelkle et al., 2012). This decreasing of accuracy of age estimation with ageing is reported in numerous studies on face (e.g.: George & Hole, 1995) and voice (e.g.: Huntley et al., 1987; Moyse et al., 2014).

The interaction between the age of stimuli and the age of participants: the own-age bias

An own-age bias has been demonstrated in several studies. According, participants estimated more accurately the age of one’s own age stimuli than the age of other age stimuli. However, contrary to previous observations that are consistent in all studies, the occurrence of an own-age bias in age estimation from faces and voices is still matter of debate. Regarding age estimation from faces, several studies reported the occurrence of an own-age bias in young and older adults (George & Hole, 1995; Moyse & Brédart, 2012; Voelkle et al., 2012). Moreover, George et al. (2000) showed that this bias is also present in children. Indeed, age discrimination of children participants was better with child than adult faces. However, another study reported the occurrence of an own-age bias only in young adults and not in older adults (Anastasi & Rhodes, 2006). However, Burt and Perrett (1995) found no difference between two age groups of participants (average age 24.5 and 50.8 years) in their age estimation suggesting an absence of own-age bias.

Regarding age estimation from voices, Huntley et al. (1987) also revealed a difficulty of older adults to estimate the age of young adults’ voices. However, the dependent variable was the estimated age. So, no information about the direction or the amplitude of error was reported. Hughes and Rhodes (2010) reported more variability between different age groups in estimating the age of older adults’ voices. However, the sample of older voices included voices of people from 56 years and over, this means that middle-aged voices rather than true « older voices » were included. These two studies bring to the conclusion of an own-age bias in older adults. An inconvenience of these two studies is that the distributions of participants’ age and of voices’ age was not exactly proportionate. In order to check these results, a more recent study examined specifically the own-age bias (Moyse et al., 2014). They asked at young (20–30 years) and older (65–75 years) participants to estimate the age of young (20–30 years) and older (65–75 years) voices. Although no difference was found between young and older participants in estimating the age of older voices, absolute error of older participants (8.42 years) was significantly higher than that of young participants (5.13 years) in estimating the age of young voices. Therefore, older participants showed a preservation of age estimation of one’s own-age voices compared with other age voices, suggesting the occurrence of an own-age bias only in older adults.

Gender of stimuli

Gender differences are of great interest in voices. Indeed, as explained in the introduction of this paper, hormonal changes have an important impact on the voice development across life. Given that these hormonal changes are different between males and females, they influence differently the sound of female and male voices. At puberty, the mean fundamental frequency of male and female voice decreases, but to a lesser degree for female voices (Abitbol, Abitbol, & Abitbol, 1999). With ageing, the pitch of male voices increases. In the opposite, the pitch of female voices first decreases with ageing, but after 80 years, the pitch increases (Etienne, 1998). Moreover, the menopause impacts the voice of females by affecting vocal folds and laryngeal function (Amir & Biron-Shental, 2004). Older male voices have a higher and more variable volume and older female voices have lower pitch and voice quality (Hummert, Mazloff, Henry, 1999). Therefore, cues indicative of age are different for male and female voices and strategies to estimate the age of male and female voices could be different (Schötz, 2005).

Results on the effect of voices’ gender are equivocal in age estimation. Some studies showed no difference between the age estimation of male and female voices (Krauss et al., 2002; Mulac & Gilles, 1996). Other studies showed an advantage in age estimation of female voices (Harnsberger, Brown, Shrivastav, & Rothman, 2010; Hughes & Rhodes, 2010; Neiman & Appelgate, 1990; Schötz, 2005). Harnsberger et al. (2010) revealed that this advantage was present only for young voices (18–30 years) since no gender differences were found for middle-aged (40–55 years) and older (62–92 years) voices. Differently, Hughes and Rhodes (2010) revealed that the amplitude of error was higher for male voices than for female voices but only in voices from 34 to 55 years and 46 to 55 years. Another study showed that the advantage of female voices occurred whatever the age of voices (Schötz, 2005). Even if these results are not totally consistent, they show that, at some age at least, there is an effect of the gender of the voice, the age of female voices being better estimated than the age of male voices.

At the opposite, to date, there are few studies on the effect of faces’ gender in age estimation and, in case of difference, these studies reported an advantage for male faces compared with female faces. Voelkle et al. (2012) reported higher errors of estimation and a tendency to underestimate the age of female faces. In the same way, in Dehon and Brédart’s (2001) study, the age of male faces (absolute error of 6.85 years) were significantly better estimated than the age of female faces (absolute error of 7.09 years). Wrinkles have been shown to influence age perception. However this influence has a different weight in male and female faces. Aznar-Casanova, Torro-Alves, & Fukusima (2010) reported that wrinkles make male faces older than female faces. Moreover, female faces keep longer babyish features (Enlow, Pfister, Richardson, & Kuroda, 1982). In addition, women may pay more attention than men to their physical appearance (Voelkle et al., 2012). Taking together, these data could explain why female faces look younger and therefore were underestimated.

Gender of participants

Regarding the influence of raters’ gender, while no significant difference between male and female performance of age estimation from voices was reported in most of researches (e.g. Braun, 1996; Braun & Cerrato, 1999; Hughes & Rhodes, 2010), Hartman (1979) evidenced a difference of participants’ gender in age estimation of male voices. Females were more accurate than males but only in estimating the age of voices of 50 and over. Consistent with most of researches on voices, no difference between male and female participants was reported in age estimation from faces (Dehon & Brédart, 2001; Voelkle et al., 2012). Moreover, as for voices, when a gender difference was reported, females estimated the age more accurately than males (Nkengne et al., 2008; Vestlund et al., 2009). Similarly, Vestlund et al. (2009) found that females are less biased than males and more accurate but only with faces from 56 to 65 years. Nkengne et al. (2008) also provided some support for a better performance in women. However, only females’ faces were used in this study. Therefore, the result may represent an own-gender bias. Unfortunately, the occurrence of an own-gender bias has not yet been examined in age estimation from voices. Few studies investigated the occurrence of the own-gender bias in age estimation from faces and studies that were carried out failed to show such a bias (Dehon & Brédart, 2001; Voelkle et al., 2012).

Conclusion

Overall, the literature on age estimation suggests that we are able to categorize a face or a voice into an age range with high accuracy (e.g., Anastasi & Rhodes, 2006; Ptacek & Sander, 1966). However, when the issue is to give an exact age, we are more accurate in age estimation from faces than from voices (e.g., Amilon et al., 2007; Moyse et al., 2014; Voelkle et al., 2012). Group characteristics have been shown to influence age estimation from faces and voices. As far as stimulus ethnicity is concerned, research reported an “own-race” effect for age estimation from faces (Dehon & Brédart, 2011) but no clear indication of such an effect for voices (Braun & Cerrato, 1999; Nagao & Kewley-Port, 2005). The age of stimuli and the age of participants impacted the performance of age estimation from both faces and voices: younger stimuli were better estimated than older stimuli and younger participants were more accurate than older participants. However, the influence of these group characteristics has not always the same impact on voices and faces. For example, gender seems to impact age estimation from voices and from faces in an opposite directions: the age of female voices was better estimated than the age of male voices (Hughes & Rhodes, 2010) whereas the age of male faces was better estimated than the age of female faces (Dehon & Brédart, 2001).

Although voices are often considered as the auditory counterparts of faces, the comparison between voice and face is not always obvious. In age estimation, methods and dependent variables differed between studies using faces and studies using voices as stimuli. In addition, in studies of age estimation from faces, participants were not submitted to time pressure; faces were presented until participants responded. By contrast, voices cannot be indefinitely listened and age estimation was given after one or sometimes two presentations of the stimulus. This difference between the duration of stimulus presentation in voices and faces could explain the superiority of faces in age estimation. Indeed, studies on age estimation from voices showed an impact of stimulus duration on the performance: longer stimulus presentation better performance of age estimation (Schötz, 2005). Therefore, future research should compare age estimation from faces and voices by applying a time pressure in both tasks. In a first approximation, the time of presentation of stimuli could be borrowed to research on person recognition. Such a comparison would help determining whether face processing is easier than voice processing in age estimation as it has been demonstrated in person recognition (Barsics, this issue; Brédart & Barsics, 2012).