WHAT DREAMING CAN REVEAL ABOUT COGNITIVE AND BRAIN FUNCTIONS DURING SLEEP? A LEXICO-STATISTICAL ANALYSIS OF DREAM REPORTS

Multidimensional statistical procedures were used to analyze three corpuses of dream reports on the basis of their word content, automatically and without any a priori coding of their meaning. The analyses revealed that the same clusters of dream reports were found consistently in all three datasets. Each cluster of dream content was characterized by a specific predominant cognitive category such as recent memory, visuo-spatial processing, verbal activity, reasoning, or emotions. In addition, the analysis of a 15-months dream diary from the same individual showed that the proportion of different clusters of dreams remained stable over time. We also show that the identification of these well-defined and common cognitive characteristics in dream reports could provide important insights into the processing of information within specialized brain regions during sleep. In conclusion, dream reports contain valuable information about the cognitive processes and brain functions that participate to their generation.

Human sleep can be investigated at different levels of biological organization, including the genetic, molecular, macroscopic-systems, and cognitive levels. An ultimate model of human sleep should account for findings issuing from these different research areas. Despite major advancements in our understanding of sleep mechanisms, from their genetic fundaments up to functional neuroimaging in humans, it remains difficult to obtain measurable and reliable data about cognitive processes during sleep. This represents a major challenge for a successful integration of cognition with more basic, physiological levels of investigation in sleep research. The goal of the present paper is to demonstrate that valuable information about cognitive aspects of sleep can be gained by analyzing dream reports, using statistical methods that do not require any a priori coding of the content of the dreams.
The hypothesis underlying this approach is twofold: (a) dream reports contain valuable information about the cognitive processes that participate to their generation; (b) cognition during sleep can be inferred from typical or common features in dream reports, independent of their specific meaning for a given dreamer. For example, vivid visual imagery, emotional intensification, and illusion of reality represent some of the defining features of common dream experiences (e.g., Hobson, 1988;Merritt, Stickgold, Pace-Schott, Williams, & Hobson, 1994;Nielsen, Deslauriers, & Baylor, 1991;Rechtschaffen & Buchignani, 1983). An additional hypothesis is that the identification of frequent cognitive features in dream reports may provide new insights into brain functions during dreaming. This also establishes a cognitive neuroscience framework for the study of dreaming, allowing the integration of dream data into a unified model of human sleep.
The studies reported here used multidimensional statistical procedures to test whether different dream reports taken from many different individuals implicate well-delineated cognitive processes, thought to be functionally segregated in the human brain. In addition, data from a dream diary covering a 15-months period were analyzed to examine the distribution of these distinct categories of dreams over time.
In the following, some historical landmarks for a cognitive neuroscience approach to dreaming will be briefly reviewed, followed by a description of the conceptual and mathematical bases of the statistical approach used to analyze the dream data. Then, two studies will be reported. Study 1 will present the analysis of 1770 dream reports selected from one dream diary, as well as a comparison of these data with other verbal written material. Cluster analysis will reveal the existence of discrete semantic categories in this first dream corpus. Study 2 will confirm and extend the results of Study 1, by using two large corpuses of dream reports from 50 male and 50 female students. Finally, these findings will be discussed in light of our current knowledge about cognitive and cerebral activity during sleep.

Dreaming to Understand the Brain in Sleep
"Men have dreamt the problem of dreaming almost since dreaming began" (adapted from Jaynes, 1977, p.2). The quest for the meaning of dreams has enjoyed an almost uninterrupted success since the publication of the Oneirocritica from Artemidorus (c. 150 A.D.;Artemidorus, 1990). In this masterpiece in the domain of the interpretation of dreams, Artemidorus hypothecated that the study of dreams could reveal important information about past or future events. Opposed to this view, the Greek philosopher  proposed that dreaming is an activity of the mind in sleep, which does not bring any message from the gods or about the future. According to him, the same mental or "cognitive" principles are shared by dreaming activity and waking perception, and dreams can therefore reveal 7 mechanisms that underlie the world of experience (Aristoteles, 2001, 463a, 463b). Aristoteles also suggested that the images in the dreams have the same origin as the hallucinations observed in pathological conditions (cf. Aristoteles, 2001, 458b: 26-28). Thus, the modern idea that experiences during dreaming and waking states might share the same underlying mechanisms stems from ancient intuitions (Hobson, 1988;Schwartz, 1999).
The development of a scientific inquiry into dream mechanisms really started during the 19 th century, when new theories were tested using proper experimental methods (e.g., Delboeuf, 1993;Macario, 1978;Saint-Denys, 1977; for review see Schwartz, 2000). Models of dreaming were proposed that even today still appear very modern. For example, Alfred Maury suggested that during sleep (but not during wakefulness) the different parts of the brain are not synchronized anymore (Maury, 1862). He claimed that perception, memory, imagination, will, judgment are unequally present during dreaming, reflecting the various degrees of activity in different parts of the brain (Maury, 1862, p. 35). Accordingly, dreaming would provide a unique opportunity to study "intellectual faculties" in humans since these faculties are not equally active during sleep, do not act in concert or influence each other anymore, and therefore become more easily identifiable (see Schwartz, 1999). This conception of dream experiences as a possible source of information about the specialization of human brain functions is further developed in the present paper.
Today, the relationship between brain function during sleep and dream features has become an attractive topic of research . This is mainly due to the recent development of neuroimaging techniques allowing the study of regional brain activity during sleep (e.g., Braun et al., 1998;Hobson, Pace-Schott, Stickgold, & Kahn, 1998;Hofle et al., 1997;Maquet et al., 1996). However, bridging brain activity and subjective experience still remains a difficult task since dreaming is by its very nature a private experience that occurs during sleep and whose study is mediated by introspective descriptions from memory (cf. "explanatory gap", Levine, 1983). Methods that can suitably characterize cognitive processes in dream reports therefore appear as an important prerequisite for integrating dream research into a cognitive neuroscience framework.

Quantifying Dream Content
Dreams are subjective experiences that can be both very vivid and often very bizarre. On the one hand, the quest for regularities in dream experiences may have arisen from the need to find some function for this seemingly gra-8 tuitous and disorganized facet of human experience. On the other hand, dream features ought to show some consistency and reproducibility if they actually relate to recurrent patterns of brain activity that are induced by neurophysiological mechanisms associated with sleep states. In this context, dreams are good candidates for a statistical endeavor, applied to large samples of dream reports.
In 1893, Mary Calkins published one of the first statistical studies of dream content (Calkins, 1893). She identified and quantified more than 10 parameters in 170 dream reports from a 32-years old man and 205 reports from a 28-years old woman (Calkins' own dreams). The dream reports were collected during 46 and 55 nights, respectively, thus corresponding to an average of 4 dreams each night. Calkins provided several results that have mostly remained valid until today. She found a clear predominance of visual experiences in dreams (57%), followed by auditory experiences (37%), and then by gustatory and olfactory experiences (1%) (similar results in Strauch & Meier, 1996). Also, by programming awakenings at different times in the night, she could observe that most of the dreams occurred during the late part of the night (between 4 and 8 am), and that these late dreams were also more vivid (similar results in Stickgold, Malia, Fosse, Propper, & Hobson, 2001).
More recently, the discovery of "physiological markers" of dreaming in the 1950s' triggered a renewed interest for dream research in the scientific community. Kleitman and his two students, Aserinsky and Dement, discovered that dreaming occurred during recurrent periods of high cortical activity during sleep accompanied by rapid eye movements (REM), and higher heart rates and respiratory activity (Aserinsky & Kleitman, 1953;Dement & Kleitman, 1957). Almost at the same time, Hall and Van De Castle published an extensive manual for coding of the content of dream reports that revived a statistical, taxonomic approach to dream features (Hall & Van de Castle, 1966). This classification system was first designed to account for common features found in a large sample of dream reports from healthy young students (e.g., people, objects, places, social interactions, activities, emotions, etc), that provided normative values for the different scales proposed in the manual. These normative values have then been used to study many different dream corpuses (e.g., Domhoff, 1996;Domhoff & Schneider, 1998).
In modern dream research, scales and coding systems are standard tools to quantify empirical as well as theoretical categories of dream features (e.g., Hobson, 1988;McCarley & Hoffman, 1981;Reinsel, Antrobus, & Wollman, 1992;Strauch & Meier, 1996). These methods are very efficient when assessing specific categories of interest, such as the different types of emotions or bizarre features in the dreams of healthy or patient populations (e.g., McCormick et al., 1997;Schredl, Schafer, Weber, & Heuser, 1998) or when testing theoretically-driven hypotheses (e.g., Hall & Van de Castle, 1966;9 Revonsuo & Salmivalli, 1995). However, these methods present two major limitations: (a) they require the delimitation of a priori categories to be quantified and can therefore still miss important information in the data, and (b) they are extremely time-consuming when analyzing large amounts of data. The following section describes a statistical approach better suited for the automated identification of recurrent patterns of content in large samples of dreams. This approach may be used as a first exploratory step that can guide the interpretation of the content of dream reports.

Multidimensional Statistical Methods for the Analysis of Dream Content
The statistical approach used here is based on methods developed for analyzing large amounts of verbal or textual data (Lebart & Salem, 1994). It is appropriate for the analysis of dreaming activity since dreams are usually reported verbally. One main property of these multidimensional statistical methods is to help visualize and extract structural characteristics from large samples of textual data (Blasius & Greenacre, 1998;Lebart, 1998). In short, these methods reorganize the information contained in any series of reports to reveal shared structures and similarities that would not be directly accessible to a non-sophisticated observer, or discernible just by reading the reports because of the large amount of data. They make manifest otherwise unrecognized regularities. In addition, unlike other content analysis methods that can be extremely time consuming, these statistical procedures require minimal or no manual coding of the data, and the preprocessing steps can be performed automatically (Benzécri, 1973;see below).
While most lexical statistical approaches privilege the semantic over syntactic aspects in textual data, it is worth noting that abstract grammatical structures are usually not isolated from semantic contexts: "word forms occur in specific syntactic constructions and […] syntactic structures are associated with lexical preferences" (Gerbig, 1996, p. 97; see also Sinclair, 1991). Importantly, descriptive lexical statistics invert the traditional distinction between quantitative and qualitative analyses. Classical content analyses start from a qualitative coding of content elements to obtain quantitative assessments, whereas lexical statistical methods start from the quantity (e.g., frequencies of occurrences of words in textual units) to guide qualitative interpretations. The classical approach has several drawbacks linked to the initial thematic coding of the content elements: variability between raters, elimination of many infrequent elements, elimination of potentially relevant elements that are not retained by the coding system, and huge time costs. By contrast, descriptive statistics can be performed automatically from the raw data to help reveal hidden structures in the data. These new statistical

STATISTICS OF DREAMS
approaches represent a significant change in the function of the statistical model: "the model should follow the data not the converse" (Jean-Paul Benzécri quoted in Lebart, Morineau, & Piron, 1995, p.1). Indeed, these techniques make minimal assumptions about the underlying distribution of the data (Benzécri, 1973;Blasius & Greenacre, 1998;Greenacre, 1994), are straightforward and well adapted to the analysis of lexical tables (Lebart & Salem, 1994;Schwartz & Baldo, 2001). On the other hand, multidimensional statistical procedures that can process large matrices of data are highly demanding in computing resources. This is why their development and application to various domains of research has emerged only relatively recently thanks to increased computer performances.
Here, two main methods will be used: factorial correspondence analysis (CoA) and cluster analysis. Statistical characteristics and practical steps in using these methods will be described below to provide an intuitive understanding rather than a detailed mathematical description of how such methods work.

Data Preprocessing
The goal of multidimensional statistical procedures is to represent large amounts of complex data within a reduced number of factorial dimensions, or group them into a reduced number of clusters. The first step in such analyses is to obtain contingency or lexical tables by conducting a series of transformations of the initial data (preprocessing). The main preprocessing steps are the following: (a) the continuous strings of letter corresponding to each text unit are segmented into their constituting words; (b) a dictionary of the lexical forms in the corpus is created; (c) the forms are lemmatized (e.g., plurals into singular forms, inflected verbal forms into infinitive forms); (d) the dictionary is reduced by getting rid of infrequent or too frequent words and of 1-or 2-letter long words (which do not provide useful information about the shared structure of the data and add noise to the analyses; see Greenacre, 1994); (e) words devoid of content are filtered out (e.g., articles, liaison words, conjunctions). These preprocessing steps allow a reduction and reorganization of the data into one single lexical table, in which each cell displays the number of times that a given word (column) appeared in a given text unit (row) (Figure 1).

Statistical Analyses
Correspondence analysis (CoA) is a statistical method whose name derives from the French "Analyse des Correspondances", where the term "correspondance" denotes a system of associations between the elements of two sets (Blasius, 1998). In CoA, the similarity between text units is assessed based on the frequency distribution of each lexical form in each text unit ("word-profile"; cf. Figure 3). Conversely, the similarity between two words is a function of their common occurrence (or absence) within the same text units. CoA represents differences between text units as geometrical distances on low-dimensional maps, that are then used to identify the main semantic or cognitive dimensions along which the text units may be differentiated (Greenacre, 1994;Lebart, 1998;Schwartz & Baldo, 2001). Methods of cluster analysis (described in more details below) can usefully complement the CoA results, as illustrated in Figure 2 (Lebart, 1994).
Lexical Tables. Each row (text unit) in a lexical table can be represented as a point in the multidimensional space of the columns (words). Conversely, each word has a specific location in the multidimensional space of the text units. The rows of a lexical table can thus be thought of as a cloud of points located in the space of the columns, or vice-versa. The goal of both CoA and clustering techniques is to look for proximity patterns in these multidimensional clouds of points. These methods are thus based on an estimation of the similarity between each report or "row-profile", in the lexical table (or between each word or "column-profile"). If, for example, two reports contain   very similar frequencies for the different words in the vocabulary, they will be located close to each other on the CoA map and grouped in the same cluster (Blasius & Greenacre, 1998). CoA involves a progressive reduction of the dimensions of the contingency table into a map of low dimensionality, while minimizing the distortion of the original distances ( Figure 3; Greenacre, 1994). The advantage of CoA is to allow a conjoint display of the distances between all the rows and columns points on a same continuous space (Greenacre, 1994) using a transition formula (Lebart, Morineau, & Piron, 1995, p. 85). Moreover, using the same transition formula, it is also possible to plot a posteriori additional (or illustrative) rows or columns that have not contributed to the structure of the original CoA map, thus providing additional information for the interpretation of the main axes or dimensions of the map. Similarly, cluster analysis groups into the same clusters the points that are close together in the multidimensional cloud, but contrary to the CoA method, cluster analysis groups rows or columns separately and looses information about the relative position of each resulting cluster in the multidimensional space (Figure 2; Lebart, 1994). Figure 2. Schematic representation of the analysis of multidimensional data using correspondence analysis (on the left) or clustering techniques (on the right). The initial multidimensional cloud of points (rows or columns) can be represented on a bidimensional CoA map (left) or as a series of clusters grouping points that were close together in the multidimensional space (right).

STATISTICS OF DREAMS
Projection onto a 2D CoA map Grouping by cluster analysis Row-profiles (or column-profiles) in the multidimensional space 13 STATISTICS OF DREAMS Figure 3. Schema of the projection of the row-profiles and column-profiles of a contingency table onto separate maps and a joint map using standardized coordinates (see main text). The rows can be represented in the space of the columns (left), which can then be flattened to obtain a bidimensional CoA map. The same can be done with the columns in the lexical Aggregated Lexical Tables. Numerically, the similarity between two rowprofiles is translated by their Euclidian distance (that can be thought of as a straight-line distance between two points in a physical space) weighted by the mass (total frequency) of the dimensions of the space. In practice, this weighting balances the contribution of the very frequent and less frequent profiles. Because of the analogy with the chi-squared concept, this distance is known as the chi-square distance.
Distance d between two row-profiles i and i' is (same notations as in Figure 3): Distance between two column-profiles: Chi-square distances have a unique property called the distributional equivalence principle (Benzécri, 1973) which implies that if two rows or text units are distributionally equivalent (that is, if they have similar profiles) then they can be replaced by one single new row which is simply the sum of the two rows, without changing the distances between the words on the CoA map (Lebart et al., 1995). This property ensures that there is no loss of information when similar profiles are aggregated and no gain in subdividing homogeneous profiles. In Study 1, the analyses will use aggregated lexical tables, with one row for each different corpus to be compared or for each relevant subdivision of the corpus.
The statistical analyses reported here were performed using the following softwares. In Study 1, a software that can lemmatize French texts was used because the main dream corpus was in French, and hierarchical clustering was also performed with the same software (Alceste 3.1; Reinert, 1986; www.image.cict.fr/index_alceste.htm). MATRIX and ANACOR functions of SPSS (www.spss.com) were used for the correspondence analyses. In Study 2, a MATLAB toolbox was developed by the author to allow these analyses to be performed within the MATLAB computing environment (www.mathworks.com).
Finally, the interpretation of CoA or clustering results was based on common and consistent semantic features. Not yet implemented, an additional step in the present process would support an automatic labelling of the CoA dimensions or clusters. More than a simple help in the labelling process, this would imply a formalization of the links between semantics and syntax in the results of lexical analyses (e.g., FrameNet project; www.icsi.berkeley.edu/~framenet).

Data Collection and Preprocessing
Main Dream Corpus. This first study was based on a selection of dreams from the author's own dream diary. Using personal data for a scientific enquiry is a common strategy in the domain of dream research (e.g., Calkins, 1893;Domhoff, 1996;Epstein, 1985;Freud, 1995;Hartmann, 1968;Hobson, 1988;Hobson, 1995;Hobson, 2002;Jouvet, 1992;Maury, 1862;Rechtschaffen, 1978;Saint-Denys, 1977). The main reason why dream researchers may trust their own dream reports most is that dream reports are descriptions of subjective experiences whose authenticity cannot be checked by any external observer. Furthermore, analyzing one's own dreams facilitates the identification of possible waking sources for the dream elements, such as knowing whose characters in the dream may be a member of the dreamer's family, or knowing that an activity in the dream is usual (or unusual) in the dreamer's real everyday life, and so on (e.g., Calkins, 1893).
The first analysis was performed on 1770 dream reports selected from the author's own dream diary. These dreams were collected in the morning, after spontaneous awakening, on a daily basis, between the 1 st September 1994 and the 30 th November 1995 (i.e., 456 consecutive nights covering a 15months period), when the author was between 28 and 30 years old. Residence and professional environment stayed the same during this period. The size and completeness of this material make it amenable to a statistical investigation that would identify any recurrent patterns in the dream reports as well as possible changes over the course of the 15 months.
The 1770 dreams, written in French, contained over 150'000 words, including about 13'000 distinct words (i.e., total vocabulary; see Table 1). About half of the words in this initial vocabulary appeared only once in the whole corpus (i.e., hapax legomena). This corresponds to a well-know empirical observation in lexicometric studies (Brunet, 1981).
The 1770 dream reports were distributed across 412 nights (~10% of the nights with no dream recall). An average number of 4 dream reports per night was found here (similar to Calkins, 1893), and the mean word-length of these dream reports was 86.4 (SD = 75.07). Automatic preprocessing steps were performed on the original dream data, including the removal of words that were too infrequent (<15 and <10 occurrences for the CoA and the cluster analysis, respectively) or too frequent (>3000 occurrences for the cluster analysis). Table 2 indicates that the number of dream reports and the extent of the vocabulary used in these reports remained very stable across the 15 months. Additional Corpuses. In order to better characterize the lexical content of the dream corpus, it was compared to a three-weeks diary of real life events also collected by the author during a trip in the U.S. The waking diary was meant to reproduce some conditions that are typical of reporting dreams, excepted that it would correspond to the report of waking experiences. First, traveling across unknown places is a condition where one is exposed to new contexts with many unusual elements and unfamiliar people, much like what often happen in dreams. Second, the daily experiences were not written down on the same day, but systematically on the next day (after one intervening night) to match the mnesic conditions of recalling dreams (e.g., discontinuity in time and action).
Finally, two additional corpuses were extracted from frequency dictionaries of French words. The first one, the Juilland dictionary (Juilland, Brodin, & Davidovitch, 1970), provided the frequencies of written words within five different genres of texts (from an initial sample of 500'000 words): theatre plays, novels, essays, newspapers, and scientific and technical reports. The second one, the Brunet dictionary (Brunet, 1981), provided the words frequencies measured within subdivisions of the corpus based on the dominant person in the texts (from an initial sample of 70 millions words): first person or soliloquy, second person or dialogue, third person or rest. These various sources of word frequencies were used to determine whether dream reports might involve distinctive lexical features as compared to other kinds of written verbal production.

Results
The dream diary was analyzed in three different ways. First, the frequency distribution of the words in the dream diary was submitted to a CoA and compared to the travel diary and frequency dictionaries. Second, a cluster analysis was performed on the lexical table with all the dreams as rows. Finally, the stability of the cognitive categories revealed by the cluster analysis was assessed for the 15 successive months of the dream diary.
Correspondence Analyses. A first CoA was performed on a lexical table containing the dream diary aggregated into 5 successive 3-months periods (5 rows), the travel diary (all days aggregated into one row), and the Juilland frequencies for 5 genres (5 rows). Only words from the dream diary that also appeared in the Juilland dictionary were included in the analysis. The final lexical table contained 11 rows and 346 words. The CoA map on Figure 4 shows that the different genres of the Juilland dictionary were aligned horizontally along the first factorial dimension of the map with the scientific texts on the left and the theater plays on the right. The second dimension of the map distinguished between the Juilland corpus and the diaries. The main result is that the dream reports differed from the travel diary on the first dimension: the dreams were towards the side of the novels (this was also confirmed when using dreams from different persons; Schwartz, 1999), as compared with the travel diary that lied more towards the side of the newspapers or essays. A second main result concerns the tight grouping of the five 3-month periods on both dimensions, indicative of their striking homogeneity across time.
In order to test whether the second vertical dimension reflected the fact that both the dream and travel diaries were written mainly from a first-person perspective unlike most other texts of the Juilland dictionary, a second CoA was performed using the Brunet dictionary with three rows, one for each dominant person in the text (e.g., 1 st , 2 nd , or 3 rd person), and 323 words (i.e., words from the dream reports that were present in both the Juilland and the Brunet dictionaries). The total dream data (one row), the travel data (one row), and the Juilland data (5 rows) were then plotted as additional or illustrative variables, i.e., variables that did not contribute to the structure of the CoA map, but whose profiles were plotted a posteriori on the map using a transition formula (see Multidimensional Statistical Methods). The main purpose of this analysis was to confirm the results of the previous analysis using another independent way of building the CoA solution, and to get more detailed information about the distinction between dream and real life reports.
The CoA map on Figure 5 shows that the first, second, and third dominant persons were distributed from the left to the right on the first horizontal dimension, whereas the first and the second dominant person were also distinguished along the second dimension of the map, respectively located on the top and bottom of the map. The Juilland data were distributed along the first dimension following the same sequence as that found in the previous analysis, thus confirming the robustness of this pattern. More critically, the dream diary was situated within the same quadrant as the first dominant person and on the same side of map as the theater plays, again close to the novels on the first dimension. By contrast, the travel diary was closer to the third dominant person on the first dimension, but also close to the first dominant person on the second dimension. Moreover, the travel diary was again closer to the essays and newspaper on the first dimension.
Taken together, these results suggest that, compared to real life descriptions, dream reports may reflect experiences that are more similar to those described in novels or theater plays, and that predominantly involve the dreamer as a first-person actor. This is consistent with the findings of Strauch and Meier (1996) who observed that the him/herself dreamer was present in 89.4% of 198 dream reports collected after awakenings from REM sleep.
Cluster Analysis. The above CoA analyses suggested that the dream diary differed from the description of real life events, but remained stable over a long period of time. Can we conclude that oneiric memories are all very similar and that they only reflect one specific category of experiences, always engaging the same cognitive processes? To address this question, a cluster analysis was conducted on the dream data alone. The goal of this analysis was to test whether different clusters of dreams could be distinguished on the basis of their lexical composition. The lexical table used for this clustering included 1753 dream reports (all non empty rows) and 1298 words (columns). The clustering performed successive partitions of the data based on a chi-squared metric similar to that used in the CoA (i.e., descending hierarchical clustering; Reinert, 1983), with reassignment of misclassified words at each partition using a generalized form of the k-means clustering technique (Diday, 1971;Reinert, 1983). Thus, two dream reports having similar word-profiles as compared to the other reports had increased probability to be grouped into the same cluster.
Five main clusters of dreams were identified by the cluster analysis. Out of the 1753 dream reports, 1713 could be successfully classified into one of these clusters (96.78% of the 1770 initial dream reports). These 1713 reports were clustered in a relatively well-balanced manner with respect to the number of reports in each cluster ( Figure 6). The structure of this dendrogram also provided information about the proximities between the five clusters: clusters 1, 2, and 3 were distinguished from clusters 4 and 5 early in the classification process. Figure 6. Dendrogram for the partition of the dream diary into 5 clusters. The cluster analysis was performed on the lexical table crossing 1713 dream reports and 1298 words. Number of reports and characteristic cognitive categories for each cluster are shown on the right side of the dendrogram. See main text for more details. The next step was to identify words that were characteristic for each cluster. Characteristic words are those allowing the best distinction between a given cluster and the others, i.e., words that are over-represented in one specific cluster based on a chi-squared criterion. This computation is straightforward. For example, the word "student" occurred in 32 reports of the cluster 1 out of a total of 54 reports containing the word "student"; the difference between the proportion of the word "student" in cluster 1 (32 vs. 54) and the size of the cluster relative to the total number of dream reports (335 vs. 1713) is significant (Chi2(1)= 55.87). The description of the clusters below reports only those words that were highly significantly associated with each cluster (the whole list of words with their P-values of association with the clusters can be found in Schwartz, 1999).

STATISTICS OF DREAMS
-Cluster 1 was characterized by words referring to the academic environment (e.g., student, professor, mathematic, psychology, assistant, classroom, etc) and by sexual or affective concerns (e.g., erotic, love, etc). There were also many words describing verbal activity (e.g., speak, heard, write, word, name, text, voice, discuss, etc) and social emotions (e.g., deserve, laugh, surprise, offend, grimace, mock). -Cluster 2 reflected different aspects related to painting, which was the second main activity of the author at that time: painting material (e.g., artwork, painting, picture, cardboard, color, canvas, paintbrush, photography, etc), places related to this activity (e.g., wall, exhibiton, studio, gallery, etc), and related social environments (e.g., friends, artists, and gallery owners, etc). Fine visual judgment and metric evaluation also characterized this cluster (e.g., centimeter, long, big, horizontal, wide, size, height, meter, etc), as well as color names. -Cluster 3 concerned festive dreams, with convivial events and gatherings (e.g., meal, restaurant, birthday, invitation, serve, order, bring, gift, offer, pay, etc), as well as food and beverages (e.g., alcohol, drink, salad, meat, wine, cheese, vegetable, pasta, rice, etc). Many people were present in the dreams of this cluster (e.g., customer, waiter, as well as members of the family, etc). -Cluster 4 included elements related to locomotion and outdoor environments. This comprised vehicles (e.g., bus, car, bike, plane, etc), landscapes and roads (e.g., sea, road, town, region, path, island, landscape, beach, countryside, lake, country, hill, river, garden, etc), moving and traveling (e.g., cross, join, direction, passage, return, journey, go through, visit, drive, climb, walk, etc). This cluster also contained many spatial markers (e.g., above, within, close, near, across, etc) and temporal markers (e.g., before, soon, since, yesterday, etc). -Cluster 5 was composed of words relating to physical danger. In particular, catastrophes (e.g., risk, accident, war, disease, explosion, radioactive, danger, smoke, blood, etc) and speeded physical movements or falls (e.g., slope, hurtle down, steep, rollers, speed, slip, fall, etc). Body parts (e.g., foot, body, shoulder, leg, hand, etc) and fear-related emotions (e.g., fear, suffer, dreadful, etc) were also present in this cluster. These five semantically well-defined clusters were likely to involve distinct cognitive functions, including recent memory reprocessing as shown by the predominance of current concerns in clusters 1 and 2, emotions and fearrelated experiences in cluster 5, very vivid visual experiences in clusters 2 and 4, and motion or motor activities in clusters 4 and 5. Moreover, the first partition of the dendrogram clearly separated clusters characterized by social interactions and current concerns (clusters 1, 2, 3), from those characterized by spatial exploration and motor activity, which were associated with elements that did not directly relate to the current environment of the dreamer (clusters 4, 5).

Temporal Course of Dream Content.
In the author's dream diary, the monthly record of the number of words as well as the number of dream reports remained relatively stable over time ( Table 2). The first CoA analysis in Study 1 also suggested that the lexical composition of the reports was reproducible over time, as demonstrated by the close location of successive 3-month periods of dreaming on the CoA map (Figure 4). These observations provided a first indication that the clusters found in the cluster analysis may not have arisen from a confounding effect due to the lexical composition of the reports progressively changing over time (i.e., lexical drift), but might rather reflect the recurrence of a limited number of categories of cognitive processes (see Discussion).
A closer look at the number of dreams belonging to each cluster confirmed that all clusters were present during each month of the diary ( Figure  7). A linear regression was performed on the clusters using the monthly number of dreams belonging to each cluster (Table 3). No significant slope was found for either cluster, but there was a trend for the two "current concerns" clusters 1 (artistic) and 2 (academic) to respectively increase and decrease in time. These reciprocal changes can be explained by corresponding changes in the real life of the dreamer (e.g., participation to artistic events at the beginning of the diary period versus more intense academic and teaching activities towards the end of this period). Additional analyses reproduced this pattern when using daily data, and also showed that cluster 2, which related to fear-related experiences, was remarkably stable over time (see Schwartz, 1999). The temporal course of the clusters thus confirmed that the global homogeneity of the corpus was best explained by a regular recurrence of each and every categories of dreaming activity over time.

Data Collection and Preprocessing
The goal of Study 2 was to assess the generalizability of some of the main findings in Study 1 by using dream reports from a large group of individuals, i.e., the same dream dataset that Hall and Van De Castle used in their influ- Linear regression for each cluster is plotted as a dashed line. Note that cluster 1 and 2 (current concerns) tended to increase and decrease, respectively, while cluster 5 (fear-related experiences) was very stable over the 15-month period. ential work on content analysis (dataset available: www.dreambank.net; Hall & Van de Castle, 1966). This corpus contains dreams from 100 male students and 100 female undergraduate students (aged 18-25) from Cleveland, Ohio. Five reports were randomly selected from each student's dream reports (each 50-300 words long) thus making a total of 1000 dream reports. The male and female dream reports were analyzed separately, using CoA and clustering techniques (k-means procedure from the MATLAB's statistical toolbox). All statistical analyses were performed within a MATLAB (www.mathworks.com) programming environment using a series of routines developed by the author and organized into a new toolbox (Statistical Lexical Mapping, SLM). The male and female corpuses were preprocessed separately using the same procedure as for the dreams in Study 1. Words that were too frequent (>=200) or too infrequent (<8) in each dream corpus were removed from the final lexical tables (see Multidimensional Statistical Methods). The initial word counts for the male and female students are summarized in Table 4.

Results
A first CoA was performed on the preprocessed dream data from the male students only (491 reports and 423 words). To get a readable map, only 114 words are plotted on Figure 8, representing the 15% of words that statistically contributed most to either the first or second CoA dimension. The geometrical proximities of the words on the CoA map can be summarized as follows: words related to fighting and flying appeared on the upper left quadrant, words referring to the academic environment of the students were located in the upper right quadrant, words related to speed and driving appeared on the lower left quadrant, and words evoking affect and sexual activity were located in the bottom right quadrant. When considering the first dimension of the CoA (horizontal axis) alone, motion and fear-related experiences appeared towards the left side of the map, whereas reasoning, inter-subjective, and verbal activities appeared more towards the right side. More generally, the right side on this horizontal dimension displayed fairly recognizable elements of the real life and environment of the students as well as their current concerns, i.e., affective and professional ones.  Figure  9 displays the 15% of the words that contributed most to either the first or second CoA dimension (124 words), like in Figure 8. First of all, there is a striking similarity between the male and the female maps, despite the fact that the two analyses were performed independently on two distinct lexical tables and that the dream reports were processed with no prior indexing or coding of the dream reports or dream elements. As in the male corpus, words referring to verbal and social activities and to the current academic and affective concerns of the students were located on the same side (right) along the main first dimension, with words related to academic concerns separated from those related to affective concerns along the second dimension (upper and lower right quadrants, respectively). Compared to the men's dreams that contained explicit references to sexual activity, women's dreams were less explicit and contained more words relative to wedding issues (e.g., marry, wedding). Here too, words evoking motion and fear-related experiences appeared on the same side (left) of the map along the first dimension. An additional category of words was located on the central lower part of the map that included many references to colors and luminosity. As in Study 1, the 491 dream reports (rows) were subsequently submitted to a clustering procedure (MATLAB k-means procedure for large matrices), using as columns the 10 first CoA factorial scores of the original contingency table (first 10 dimensions, see above). The "silhouette" function of MATLAB provided an estimation of how well the resulting clusters were separated. The statistically most robust partition for both the female and male corpus was found with 5 clusters. The characteristic words for each cluster of dreams were then calculated on the basis on their proportion in each cluster (chisquare statistics, see Study 1), reported in Appendix A,B. Although there were some minor differences, the clustering of the words confirmed the main semantic pattern found on the CoA.

STATISTICS OF DREAMS
-Cluster 3 concerned sportive activities (baseball and swimming for the boys and girls, respectively). This cluster of dreams was not clearly visible on the CoA maps. -Cluster 4 differed for the male and female dream reports, with words referring to flying or war and words referring to shopping and colors, respectively. -Cluster 5 was characterized by motion (run, drive, push, ride fast, escape, climb, jump, fall, etc), vehicles (car, train, boat), violence (hit, shoot, dead, kill, throw, etc), and fear-related words (scream, frighten, afraid, etc). These results therefore demonstrate that CoA and cluster analysis can reveal word-patterns in dream reports that corresponded to consistent cognitive categories. Here, as in Study 1, the combination of CoA and cluster analysis provided useful complementary information about the data, highlighting striking similarities as well as a few differences between the dream reports from male and female students.

Discussion
The statistical methods used here aimed at distinguishing semantic patterns in dreams based on the distribution of words in the dream reports (word-profiles). These statistical analyses allowed the extraction of information from large samples of dream reports, based on the frequency distribution of the words in each dream. These methods did not aim at studying bizarre aspects in the dreams that most often arise from impossible temporal or spatial juxtapositions, and from unusual binding of features, objects or contexts within the dreams (see Reinsel, Antrobus, & Wollman, 1992;Revonsuo & Krista, 2002;Revonsuo & Salmivalli, 1995;Williams, Merritt, Rittenhouse, & Hobson, 1992). Instead, the goal of the approach proposed here was to allow the automatic analysis of large samples of dream reports that would go beyond idiosyncratic or bizarre aspects in the reports, and emphasize basic commonalities in dream activity. Applying such statistical methods to dream reports offers a novel way to unveil intrinsic properties of dreams.

Dream Reports as a Self-Referential Fantasy
The comparison between the distribution of words in the dreams and in other kinds of written material (Study 1) showed that dream reports constitute a specific category of texts, but that they respect standard lexicometric

STATISTICS OF DREAMS
properties. The pattern of word frequency in the dream reports resembled fantasy work like novels or theater plays, whereas reports of waking experiences were more similar to newspaper or essays. Also, the dream reports were close to monologues or to first person's literary work, as opposed to waking experiences that were closer to dialogues or second person's writings. This is consistent with previous observations that the presence of the dreamer him/herself in the dreams is one key feature of dream experiences (e.g., Cicogna, Cavallero, & Bosinelli, 1991;Strauch & Meier, 1996), that is even observed in children's dreams (Resnick, Stickgold, Rittenhouse, & Hobson, 1994). As a textual genre, dream reports may thus be qualified as "self-referential fiction".
A second important finding is that dream reports formed a highly homogeneous category of written production, as found when analyzing different portions of a dream diary, and as evidenced by the regular recurrence of some clusters of dreams across time or across individuals. The stable distribution of the different clusters of dreams in time and their reproducibility across dream samples add support to the hypothesis that dreaming experiences might reflect characterized neurophysiological conditions and patterns of brain activity during sleep Maquet et al., 1996;.

Cognitive Dissociations in the Dreaming Mind
Both the CoA and cluster analysis results revealed well-segregated cognitive dimensions in the data. Distinct cognitive processes therefore contributed to distinct dream episodes, suggesting that dreaming is not a unitary phenomenon. For example, a clear dissociation between verbal and motor activities emerged both in the clustering results (cluster 1 vs. 5) and on the CoA maps (see Figures 8,9). This dissociation implies that these two kinds of activity are most likely to prevail in different dream episodes. According to our hypothesis, this dissociation suggests that brain circuits subtending the ability to manipulate language or communicate verbally may become active during sleep independently of the brain regions involved in motor behavior, and vice versa (see also below). This also corroborates the idea that the anatomical segregation of brain functions is mostly preserved during sleep (see Introduction).
Visual perceptions were omnipresent in the dreams, but certain visual features seemed to be associated with distinct categories of dream reports. For example, in Study 1, the cluster analysis distinguished fine visual judgment, metric evaluation, or color vision (cluster 2) from motion perception (clusters 4, 5). Moreover, in Study 2, words referring to visual motion were also grouped on the left side of the CoA maps and were selectively present in clusters 4 and 5 of the cluster analysis in both the male and female students dream reports. Another dichotomy appeared between dreams with landscapes and outdoor representations and dreams containing familiar people (clusters 4, 5 vs. 1, 2, 3 in Study 1; clusters 4, 5 vs. 1, 2 in Study 2). The dissociation between these different visual properties is consistent with wellknown functional specializations in the human brain (e.g., color vision in V4, Bartels & Zeki, 2000;motion processing in MT/V5, Rees, Friston, & Koch, 2000;Watson et al., 1993; layouts and landscapes in the parahippocampal place area, Epstein & Kanwisher, 1998; face processing in the face fusiform area, Kanwisher, McDermott, & Chun, 1997). Such functional dissociations in visual activity suggest that visual aspects in dream experiences might result from distinct but reproducible patterns of activation within associative visual areas.
Another striking dissociation was found in all three corpuses. Fear-related experiences or emotions (cluster 5) were clearly separated from dream elements referring to affective and working concerns from waking life (clusters 1, 2; see also Figure 8,9). Recent studies have showed that there are tight functional and anatomical relationships between emotional and memory processing that may involve limbic circuits, in particular the amygdala (Adolphs, Tranel, Damasio, & Damasio, 1995;Bechara, Damasio, Damasio, & Lee, 1999;Morris et al., 1998;Phan, Wager, Taylor, & Liberzon, 2002;Vuilleumier et al., 2002). Such a link would imply that increased emotionality and increased activation of the amygdala during sleep provide a permissive condition for emotionally-relevant elements of memory to be selectively reprocessed in sleep (Maquet et al., 1996). This is consistent with an overrepresentation of recent elements of real-life and social emotions as an isolated dimension in the dreams (Hobson & Pace-Schott, 2002;Schwartz, 2003). On the other hand, intense fear-related emotions associated with rather unfamiliar settings as compared to waking life might suggest that enhanced activation of the amygdala also contributes to the rehearsal of more "primitive" behavioral responses to threatening stimuli (Revonsuo, 2000; see also below). Therefore, different varieties of memory functions may operate during sleep.
The patterns of dream content reported here substantiate the initial hypothesis according to which cognition during sleep can be inferred from typical or common features in dream reports, and provide new information about cognitive processes participating to dream experiences. In particular, consistent cognitive dissociations were found in independent sets of dream data. This also suggests that different dream reports might involve distinct cognitive functions, which are know to be functionally segregated in the human brain, as will be further discussed in the next section.

Functional Dissociations in the Dreaming Brain
The pattern of brain activity reported in recent brain imaging studies of human sleep is globally consistent with some features commonly found in the dreams . For example, we found that dream reports are characterized by a first person perspective (Study 1; see also Rechtschaffen, 1978;Strauch & Meier, 1996). This may reflect the activation of somatosensory cortex observed during REM sleep (Maquet et al., 1996), since somatosensory cortex may be specifically involved in distinguishing self-produced actions or thoughts from those generated by others (Ruby & Decety, 2001). Another example concerns the abundance of vivid visual experiences in dreams which is in good accordance with the activation in associative visual areas during REM sleep (Braun et al., 1997(Braun et al., , 1998Maquet et al., 2000). However, in the present analyses, different clusters of dreams revealed detailed dissociations within the visual domain, such as color and metric evaluation vs. motion processing, or places vs. faces processing (see above). The predominance of segregated visual activities in dreams would implicate heterogeneous but reproducible patterns of brain activation within the ventral visual stream. We expect that future brain imaging studies of human sleep using better spatial and temporal resolutions will help to disclose such transient patterns of cerebral activity.
Motor activity is one more feature that was frequent in dream reports (see also Domhoff, 1996). In normal sleep, muscle atonia during REM sleep prevents the dreams to be acted out. However, in patients suffering from "REM sleep behavior disorder", an intermittent lack of muscle atonia may lead to complex motor behaviors during REM sleep that were shown to directly relate to the dream experiences reported by the patients (e.g., Lapierre & Montplaisir, 1992;Schenck, Bundlie, Patterson, & Mahowald, 1987). In normal healthy subjects, motor and premotor cortices were found to be very active during REM sleep (Maquet et al., 2000), therefore suggesting that dreamed movements may indeed exploit the same cortical circuits as the ones underlying motor behavior at wake, except that real execution of the movements during REM sleep is normally prevented by muscle atonia. In the present analyses, intense motor activity appeared to represent a selective feature of certain dream episodes that may coincide with periods of high activation in motor cortices during sleep.
In animal and human sleep, structures of the limbic system are activated, including the amygdala (Bordi, LeDoux, Clugnet, & Pavlides, 1993;Braun et al., 1997;Lydic et al., 1991;Maquet et al., 1996;Nofzinger, Mintun, Wiseman, Kupfer, & Moore, 1997). During wakefulness, the amygdala is involved in the processing of threatening stimuli or stressful situations, but also in the processing of emotionally relevant memory traces (see above). Therefore, the activation of the amygdala during sleep may indicate that emotionally salient elements of waking life are reprocessed during sleep (see Hennevin, Maho, & Hars, 1998;Maho & Hennevin, 2002;Maquet & Franck, 1997;Mavanji, Siwek, Patterson, Spoley, & Datta, 2003;Pare, Collins, & Pelletier, 2002;Ribeiro et al., 2002;Stickgold, 2002;Wagner, Gais, & Born, 2001). The lexical analyses reported in the present paper confirm and extend the well-documented observation that fear-related experiences are abundant in dreams (e.g., Calkins, 1893;Merritt et al., 1994;Nielsen et al., 1991;Strauch & Meier, 1996). Fear-related emotions appeared in one specific cluster and their occurrence was highly stable over the 15 months period of the dream diary. This is suggestive of internal state-dependent neurophysiological factors operating on a regular basis during sleep. However, most fearful dreams did not relate to current concerns of the dreamers, at least not under the most recognizable form (i.e., fearful emotions and current concerns were not grouped in the same clusters and appeared on opposite sides of the CoA maps). One explanation for this dissociation might be that the amygdala has two distinct functional roles: (a) the reactivation of a set of primitive or innate behavioral reactions to life-threatening conditions (see Revonsuo, 2000); (b) the reprocessing of novel, unresolved, and emotionally-relevant waking events allowing the reorganization of new memory representations (e.g., Domhoff, 1996;Nader, Schafe, & Le Doux, 2000;Wagner, Gais, Haider, Verleger, & Born, 2004). The latter role of sleep would also be consistent with functional reorganizations at the level of brain systems that underlie experience-dependent changes in behavioral performance and in regional brain response (Maquet et al., 2000;Maquet, Schwartz, Passingham, & Frith, 2003;Peigneux et al., 2003;Schwartz, Maquet, & Frith, 2002). Moreover, the functional relationships between distant brain areas including the amygdala varies as a function of wake and sleep stages (Braun et al., 1998;Dave & Margoliash, 2000;Lee & Wilson, 2002;Louie & Wilson, 2001;Maquet & Phillips, 1998), therefore providing a possible complementary explanation for the dissociation between the processing of fearful experiences and recent memory observed in the dream reports. Hence, the analysis of human dreams may provide unique insights into the sleeping mind, and perhaps also into amygdala functions.
In sum, the segregation of dream reports into distinct cognitive categories as found here strongly supports the idea that brain activity during sleep might involve the regular engagement of distinct specialized brain subsystems. Hence, sleep may provide a favorable condition for the modular architecture of brain functions to be expressed and consolidated.

Practical Applications
One main advantage of the statistical methods described in the present article is to allow the rapid extraction of patterns from large samples of dream reports. They also allow the comparison of large samples of reports from various individuals or groups, including clinical populations such as neurological or psychiatric patients (Schwartz & Baldo, 2001). The comparison between the male and female dreams in Study 2 illustrated that the methods could efficiently pinpoint deviations as well as similarities between these dream corpuses.
In the present paper, we analyzed data corresponding to what dreams are for most of the people most of the time: complex subjective experiences occurring during sleep, that are recalled after spontaneous awakenings in the morning. These dreams are likely to reflect dreams usually associated with REM sleep episodes that are known to predominate during the last part of the night (Stickgold, Malia et al., 2001;Calkins, 1893). However, same methods could be used to analyze dreams or memory reports from different stages of sleep or wakefulness in order to better understand how cognitive processes are modulated by such changes in brain states. This might be particularly interesting with respect to the recent hypothesis that distinct sleep stages play a role in the consolidation of distinct memory types Gais, Plihal, Wagner, & Born, 2000;Plihal & Born, 1997;Stickgold, Whidbee, Schirmer, Patel, & Hobson, 2000; for review see Peigneux, Laureys, Delbeuck, & Maquet, 2001). The analysis of the categories of dreams associated with different sleep stages may therefore provide important information about the varieties of human sleep functions.
Such methods also allow the characterization of single dream reports, based on prior CoA or cluster analyses performed on other samples of data from the same or from other subjects. For example, in Study 1, illustrative variables were plotted a posteriori on an existing CoA map ( Figure 5; Lebart et al., 1995;Lebart & Salem, 1994). Thus, a new dream report can be plotted on a CoA and will appear close to dreams with a similar profile of wordfrequency. An analogous procedure can be used for the clustering results by assigning a dream report to the closest cluster (Anderberg, 1973;Ball & Hall, 1967). For sleep studies, the possibility to get detailed information about cognitive processes during sleep using these statistical methods is particularly valuable when experimental variables cannot be manipulated or collected without disturbing sleep. For example, the categorization of single dream reports may be very useful when analyzing neurophysiological measures collected during sleep (e.g., EEG or fMRI), to investigate brain activity selectively associated with certain cognitive categories of interest (cf. .

Conclusion
The goal of the present article was to establish the validity of a statistical approach to dream content. Multidimensional statistical procedures provided well-organized visualizations and robust classifications of large samples of dream reports. Correspondence analysis and cluster analysis performed on different large corpuses of dream reports revealed valuable and consistent information about cognitive aspects in dream experiences. Cognitive categories were not only reproducible across different dream corpuses from different individuals, but they also remained stable over time. The statistical approach presented here may promote the integration between cognition in sleep and more basic, physiological levels of investigation by providing measurable and reliable data about cognition in sleep based on the analysis of dream reports.