Penultimate version; published in Psychologica Belgica A Validation Study of the Age-of-Acquisition Norms collected by Ghyselinck, De Moor, & Brysbaert

Ghyselinck, De Moor & Brysbaert (2000) collected age-of-acquisition (AoA) norms for 2,816 Dutch four- and five-letter nouns based on student ratings. To assess the validity of these ratings, we presented a sample of the words to children from kindergarten and the last year primary school. Overall, the validity data are in agreement with the rating data, so that the Ghyselinck ei al. measures can be used for further research on the effects of AoA. In addition, the rated AoA norms correlate with young children's actual spoken language use, as assessed on the basis of the CHILDES data base.

In recent years, it has become clear that the age at which words are acquired, is a significant variable in word and picture processing (e.g., Gerhand & Barry, 1998;Morrison and Ellis, 1995).The effect of AoA has largely been ignored because it is heavily confounded with frequency of occurrence, and because of the limitations of the existing AoA measures.
As for the latter, it has been objected that AoA measures are only available for a limited number of words, and that they are mainly based on ratings from a small number of students.
As usual, the situation is worse for non-English languages than for English.
In Dutch, the only source of AoA estimates have been teacher ratings collected by Kohnstamm, Schaerlaekens, de Vries, Akkerhuis, & Froonincksx (1981).These authors asked a representative sample of teachers from the last year Kindergarten and the first year primary school which words should be known by 6-year olds.Although the teacher ratings form an interesting source of information on AoA and have been used successfully in previous studies (Brysbaert, 1996;Brysbaert, Lange, & Van Wijnendaele, in press;Brysbaert, Van Wijnendaele, & De Deyne, submitted), they are limited both in number and in scope (because they only provide estimates for the age of 6 years).Therefore, Ghyselinck, De Moor, and Brysbaert (submitted) recently collected AoA measures for 2,816 Dutch four-and five-letter nouns, and showed that these measures were reliable, at least for the Dutch speaking region of Belgium (due to their very high correlation with ratings previously compiled at the University of Leuven).
The present study was designed to assess the validity of the Ghyselinck et al.AoA measures, as it might be argued that retrospective student estimates, despite their high reliability, are limited in use because they are biased by other variables.For instance, students' ratings may be influenced by word frequency, because people tend to underestimate the AoA of frequent words and to overestimate the AoA of rare words (Brysbaert, 1996).In English, several of these validation studies about AoA-ratings have been run, generally with reassuring results, but to our knowledge, no such study has been done in Dutch.Gilhooly and Gilhooly (1980) were among the first to examine the validity of AoA measures obtained from student ratings.In a first study, they selected 40 words from the Crichton Vocabulary Scale and 13 words from the Mill Hill Vocabulary Scale.For each word an objective AoA measure was available since the words in the scales were ordered according to the frequency with which they were known by children between 5 and 11 years (Crichton Vocabulary Scale) and by children between 5 and 16 years (Mill Hill Vocabulary Scale).Seventy undergraduate students were asked to indicate on a 9-point rating scale when they thought they first had learnt these words.Using Pearson correlations, Gilhooly and Gilhooly found that rated AoA was the best predictor of the Mill Hill rank position (r = 0.93), followed by Thorndike-Lorge frequency (r = -0.77).Multiple regression analysis indicated that rated AoA was the only variable that made a significant independent contribution in predicting the Mill Hill rank position.In their second validation study, Gilhooly and Gilhooly presented 48 words for which Gilhooly and Hay (1977) had collected AoA ratings, to children of different ages and tested the children on the meaning of these words.A word that was not correctly responded to by at least 50% of an age group was further tested in a younger group.For each word, the average age at which 50 % of the children knew the word was determined and taken as an objective measure of AoA.The results showed that rated AoA was the best predictor (r = 0.84) of the objective AoA estimates, followed by Thorndike-Lorge frequency (r = -0.68).Multiple regression analysis again indicated that rated age was the only variable that made a significant independent contribution to the prediction of objective AoA.On the basis of these results, Gilhooly and Gilhooly (1980) concluded that adult ratings are valid indices of AoA.Other evidence supporting the validity of rated AoAs has been obtained by Carroll & White (1973a), Lyons, Teer, and Rubenstein (1978), Jorm (1991), Walley and Metsala (1992), and Morrison, Chappell, and Ellis (1997).Morrison et al.'s (1997) study is the largest attempt ever made to collect objective AoA measures.In this study, 280 British children were asked to name 297 pictured objects.
Two AoA scores were calculated.First, a logistic regression analysis was used to estimate the age at which 50% of the children would be able to name a picture correctly.Second, the AoA of a word was defined as the age at which 75% of the children named the item correctly.These objective norms were compared with rated AoA norms (based on 20 participants), and both yielded a correlation of .75.On the basis of this finding, Morrison et al. (1997) concluded that "objective measures should be used when available, but where not, our data suggest that adult ratings provide a reliable and valid measure of real word learning age" (pp 528).
The English findings suggest that adult ratings are fairly good estimates of when children acquire words.However, to our knowledge no such study has been done in Dutch yet.Therefore, if the AoA measures of Ghyselinck et al. are to be used in psycholinguistic experiments, a representative sample of them should be checked against the real performance of children.Such a validation study has its own limitations (the sampling of children, the way in which the knowledge is tested, the scoring, cohort differences, etc.), but provides invaluable information about the correspondence of the student ratings with the actual performance they are supposed to measure.Two validation studies were run.In the first study, 80 children of the last year Kindergarten were tested on the meaning of 230 supposedly early-and 24 late-acquired words.In the second study, 172 children of the last year of primary school were tested on the meaning of 410 words that were either rated as early-acquired or late-acquired.The reasons for using these two different age groups are to examine (1) whether words rated as early acquired are indeed known by 6-year olds, and ( 2) to what extent words rated as late acquired are known by 12 year olds.Following Morrison et al. (1997), we expected early acquired words to be known by at least 75% of the children of 6 year, and we expect significantly lower percentages of children (both of 6 year and of 12 year) to know the late acquired words (preferentially not more than 25%).

Word Sample
Since our primary concern is to have a set of words that enables us to use fully factorial designs in future research on the effects of AoA and frequency, the sample of words tested in the two validation studies was selected from an orthogonal combination of rated AoA and frequency.For all 2,816 words from the Ghyselinck et al. study, rated AoA was plotted against frequency (see Figure 1).Next, four quadrants were formed in such a way that each quadrant contained at least 30 words.We sought to separate the quadrants as far as possible.Due to the scarcity of early acquired low-frequency words and late acquired highfrequency words, the borders of the quadrants had to be set at 7 years (upper border early acquired words) and 10 years (lower border late acquired words) with respect to AoA.The borders for frequency were chosen at 1.5 (upper border low frequency words) and 2.5 (lower border high frequency words).
Experiment 1 (with 6-year olds) mainly included words from the lower left quadrant (early-acquired, low-frequency words) and matched words (for AoA) from the upper left quadrant (early-acquired, high-frequency words).The stimulus set of the second study (with 12-year olds) contained stimuli from all four quadrants.

Method
Word Sample.We selected 230 early-rated words and 24 late-rated words from the Ghyselinck et al. data base (see Appendix 1).The early-acquired words had an estimated AoA below 7.5 years; the late-acquired words had an AoA between 10 and 11 years.These words were distributed over four lists, which were matched on AoA and frequency.Three lists contained 58 early-rated words and 6 late-rated words, and one list contained 56 earlyrated words and 6 late-rated words.Our main objective was to see to what extent the earlyacquired words were mastered by 6-year olds.The late-acquired words were mainly added as a check of the validity of our procedure (i.e., to ensure that the cues provided to the children were not of such a nature that the children could easily guess the meanings of the words without actually knowing them).In order not to discourage the children by presenting them too many words they did not know, and given the main purpose of the experiment, the number of late-acquired words was limited to 6 items per list.Each child got only one of the four lists.
Procedure.The children were tested individually in a quiet room.They were told that the experimenter would read a set of words and that their task was to explain the meaning of the words.They were further told that if they did not know the meaning of a word, the experimenter would read four sentences from which they could select the one that in their view provided the right description of the word.The definitions of the words and the three wrong alternatives were selected from the Van Dale Junior Woordenboek.It took on average 20 minutes for the children to complete the list.Two different orders of the words were used.
Participants.Eighty children (43 girls and 37 boys) from the last year Kindergarten (mean age 5.6 years, ranging from 5 to 6 years; testing in the months April and May) participated in the study.The children were drawn from 4 different schools in and around the city of Gent.Three of the participating schools were Catholic schools.All children were native speakers of Dutch and none was bilingual.We took care that none of the participating children followed special courses for Dutch.
Scoring.All responses were tape-recorded.Three independent judges (graduate students experimental psychology) listened to the tapes and indicated for each word whether the child knew the word right away, after the presentation of a cue (the four alternatives), or not at all.The judges were unanimous in their evaluations on 4,642 of the 5,088 ratings (90%).The judges were paid for their participation.

Results
An alphabetical listing of the 230 early-rated words and the 24 late-rated words together with their rated AoA and logarithmic frequency is presented in Appendix 1.For each word, we mention (1) the percentage of children that knew the word without a cue, (2) the percentage of children that knew the word after a cue was given (the four alternatives), (3) the percentage of children that gave a wrong answer to the four alternatives, and (4) the percentage of children that refused to give an answer.[Wendy, de appendix aanpassen] The percentages are calculated on all observations per word (i.e., the number of children times 3 judges).Because it turned out that many cued answers were given by children who were too shy to say the answer right away, we will base our objective AoA measure on the sum of correct cued and uncued responses (see also Morrison et al., 1997).- -----------------------------Insert Figure 2 here   ------------------------------Words rated as early acquired.Of the 230 early-rated words, 55 (24%) were known by less than 75% of the children.Most of these words had a rated AoA of more than 6 year, which is not surprising, given that the children in this study had a mean age of 5.6 years.Of the 95 words with a rated AoA of 6 years or younger, only 5 items failed to reach the 75% criterion.These were the words 'toet' (58%), 'luier' (18%), 'zaal' (73%), 'getal' (71%), and 'klank' (58%).The reason why the word 'toet' was not known by 75% of the children probably is that the students had interpreted the word as an onomatopoeia [toot], whereas the first definition of the word (and the one we asked for) is an informal word for 'face', which is rarely used in the Dutch-speaking part of Belgium.The low percentage of the word 'luier' [nappy] probably reflects the fact that the word has become old-fashioned, because it refers to a linen, washable nappy, whereas nowadays most families use disposable nappies, which are called "pampers".Despite the high percentages of 'zaal' [large room] and 'getal' [number], they fell short of the 75% criterion.It may be noted that the word 'zaal' is mostly used in compound words (e.g.'eetzaal' [dining-room]).As a result children may be more aware of the meaning of the word when it is used in a compound word.During the session, many children confused the word 'getal ' [number] with the definition of 'letter' [an alphanumerical letter].For example, when children were asked to explain the meaning of the word 'negen' [nine] they often said this was a letter, confusing the concept of letter with the concept of number.Given the high percentages of children that knew the words 'negen' [nine] and 'zeven' [seven], it is clear that the children had a notion of what numbers are but they could not explain the concept yet.The low percentage of the word 'klank' [sound] may be due to the fact that other words (e.g., 'geluid', 'toon') are used to refer to the meaning of this word.
Words rated as late acquired.Of the 24 words rated as late acquired (AoA of 10 years or older), 8 words were known by more than 25% of the children.These were the words 'cocon' (52%), 'list' (29%), 'motel' (70%), 'sauna' (35%), 'wals' (31%), 'sfinx' (40%), 'fuif' (32%) and 'kiosk' (28%).A closer examination of the results showed that these high percentages were mainly caused by the percentages of children that knew the word after a cue (the four alternatives) was given.Except for the word 'cocon ' [cocoon], no word reached a percentage of more than 25% when no cue was given.This might indicate that the words rated as late acquired are not actively known by 6-year olds but that some children were able to select the right alternative when cued with four alternatives.The high percentages in the uncued condition for the word 'cocon' were to a large extent due to the fact that in one of the classes children had recently learnt about the butterfly.As a result, many of the children knew very well what a 'cocon' was.
Regression analyses.Finally, we repeated Gilhooly and Gilhooly's (1980) analysis and looked at the correlations between rated AoA, log frequency, and percentage of children that knew the word.In this analysis, the 5 early-acquired words that failed to reach the 75% criterion were omitted.The correlation between rated AoA and percentage known was .75(N = 249), the correlation between log frequency and percentage known was .26.A multiple regression analysis, showed that only rated AoA had significant predictive power for percentage known (57% of the variance explained).Log frequency added less than 1% of the variance explained (t(246) = 1.12, p > .20).Similarly, log frequency explained only 0.8% of the variance in AoA ratings, when percentage known was partialled out, although the contribution this time was significant at the .05level (t(246) = -2.17,p < .05).

Discussion
Experiment 1 showed that the AoA norms collected by Ghyselinck et al. agree very well with the actual performance of children of at the age of six.Only for a very small percentage of words (some 5%) there is a discrepancy, which most probably is due to cohort differences.The clearest example of this is the word 'luier' which was 15 years ago still quite common, but not is virtually replaced by the word 'pamper'.Other examples are objects that now figure in movies and cartoons for children (e.g.'sfinx').It may be noted that not only the rank order of acquisition is well captured by the Ghyselinck et al. ratings, but also the exact time of acquisition.Indeed, many words that were rated above 6 years, were not known by the children of 5.5 years we tested.Participants.Participants were 172 children from the sixth grade of primary school.
The children were drawn from rural and city schools in and around Gent.All participating schools were Catholic.All the children were native speakers of Dutch and none was bilingual.We took care that none of the children followed special courses Dutch.For privacy reasons, we were not allowed to ask the age of the children, but it can be estimated around 12.5 years (testing time: April -May).

Results
An alphabetical listing of the 410 validated words together with their rated AoA and the percentage of children that correctly indicated the meaning of the word is presented in Appendix 2. For each word, we mention (1) the percentage of children that correctly indicated the meaning of the word, (2) the percentage of children that did not know the meaning of the word and ringed the option 'I don´t know what a ... is', and (3) the percentage of children that chose the wrong alternative.As only a few children chose a wrong alternative when they did not know the word, we did not use a guessing correction (e.g., the number of correct answers minus half the number of wrong answers).A scatter plot of rated AoA versus percentage of children that correctly indicated the meaning of the words is shown in Figure 3.
Words rated as late acquired.Of the 230 words with a rated AoA higher than 7 years 81 words (35%) were known by more than 75% of the 12 year old children.The data for these words were further analysed by calculating the percentages of words known by at least 75% of the children for different AoA-intervals (see Table 1).From this table it can be seen that the higher the AoA rates assigned by the students, the lower the percentages of words that were known by at least 75% of the children, indicating that the student ratings reflect order of acquisition.
Regression analyses.The correlation between AoA and the percentage of children that knew the word, was 0.81 (N = 404) when the six "suspicious" early-acquired words were discarded.The correlation between log frequency and the percentage of children that knew the word, was .63.A multiple regression analysis indicated that both AoA and log frequency were predictors of the percentage known, respectively accounting for 65% of the variance (t(401) = -18.41)and 3.4% of the variance (t(401) = 6.64, p < .01).Similarly, rated AoA was predicted by both percentage known (65% of the variance, t(401) = -18.41),and log frequency (1.8% of the variance, t(401) = -4.66).

Discussion
In general, the results of Experiment 2 corroborated those of Experiment 1, although the picture was slightly less clear than we had hoped for.In particular, frequency started to slip in, and words were less well known than they should have been according to their rated AoA (see Table 1).On the basis of this single experiment, it cannot be decided to what extent the procedure (especially the group testing) accounted for the problems, and to what extent they reflect genuine differences in the order of acquisition.
It is not unlikely that the order of acquisition becomes more heterogeneous for lateacquired than for early-acquired words, with larger interindividual differences for words acquired at the end of the primary school than for words acquired in Kindergarten.Also, it is a general finding in serial recall that performance is better for the first and the last stimuli than for the middle stimuli (i.e., the serial position curve).These factors may explain why frequency had an influence on the AoA ratings of the students and on the actual performance of the children.Still, it should be noted that the contribution of frequency, although significant, was very small in practical terms (less than 2% of the variance of the rated AoAs).
The multiple regression analyses only capture the order in which words have been acquired.Independent of this issue, it also looks like the students underestimated the AoA of many words learnt after the age of 6.This observation was also made by Morrison et al. (1997).Comparing rated AoAs with objective data, they found that raters slightly underestimated the age at which they had learnt late-acquired words.A possible explanation for this shift may be that the students used their AoA ratings of early-acquired words as a reference point against which they rated the late-acquired words.As a result they presumably gave late-acquired words an AoA rating close to the AoAs of early-acquired words, resulting in lower AoA ratings than suggested by the objective data.Another possibility is that the students in the study of Ghyselinck et al. (submitted) gave a lower AoA to late-acquired words because they indeed learnt these words at a younger age than indicated by the 12-year old children of Experiment 2. When we distributed our lists, we used no criteria to select pupils, but we handed the lists out to all the children of a class, regardless of their school results.Yet, only some 20% of these children will eventually go to university.Hence, the first-year university students who formed the basis of the Ghyselinck et al. norms, probably corresponded more or less with the upper fifth of the pupils in the primary school classes.A solution could be to lower the inclusion criterion for determining the objective AoA.Gilhooly and Gilhooly (1980), for instance, used a cut-off point of 50% to calculate the real AoA.

Conclusion
In this study, an attempt was made to validate a sample of the AoA norms gathered by Ghyselinck, De Moor and Brysbaert (submitted).The sample consisted of words that are potentially very useful in further experiments on the effects of word AoA and word frequency, because they allow an orthogonal variation of these variables.The results of the two validation studies can be summarised as follows: (1) Except for a few outliers which can be accounted for, words rated as early acquired are indeed known by 6-year olds, and (2) words that have been rated as late acquired are indeed late learnt words, not even known by many 12-year olds.Given these results and the fact that frequency had but a very small effect on the AoA ratings, once the objective AoA was partialled out, we can safely conclude that the AoA norms collected by Ghyselinck et al. are valid measures of the age at which a word is acquired and can be used in future research on the effects of AoA.Rated AoA

% Known
Figuur 3: Rated AoA plotted against the percentages of 12 year old children that knew the words and this for 410 early-rated and late-rated words.

Figure 2
Figure2shows a scatter diagram of rated AoA versus the percentage of children that and Procedure.Stimulus materials consisted of 410 words coming from the four quadrants indicated in Figure 1.For the early-acquired words there was an overlap with the words from Experiment 1, but this overlap was not complete (see Appendix 2).The 410 words were distributed over four lists (2 lists of 103 words and 2 lists of 102 words) matched for AoA and frequency.The words were printed in a random order, 10 words per page.Together with a word, three possible explanations of the word and the option 'I don´t know what a ... is' were presented.The definition of the test word and the two alternative definitions were selected from the Van Dale Junior Woordenboek.Three different permutations of the words and two different permutations of the alternatives were used.The lists were randomly assigned to the children in a class, one list per child.The children filled in the list under supervision of the teacher and the experimenter.The children were instructed to indicate for each word the correct meaning by ringing one of the alternatives.The children got as much time as they needed to complete the list.It took on average 40 minutes to do so.

Figure 1 :Figure 2 :
Figure 1: Rated AoA plotted against logarithmic frequency for all 2816 words from the study of Ghyselinck, De Moor & Brysbaert (submitted).

Table 1 :
Percentages of children that knew the 410 words (% known), split up according to the rated AoA of the words.