Recent studies have shown that students often misinterpret the area of the box in box plots as representing the frequency or proportion of observations in that interval, while it actually represents density. This misinterpretation has been shown to be based on the saliency of this area and can be explained by heuristic reasoning as defined by dual process theories. In this study we tested whether expert users of box plots also display this misinterpretation and show signs of the same heuristic reasoning as found in students. Using a reaction time test, we found signs of heuristic reasoning in experts, both with respect to accuracy and reaction times. If even experts have difficulty interpreting box plots, one can question whether these are an appropriate form of representation to use when reporting data and deserve the prominent place they currently have in the statistics curriculum.

According to Wilkinson and The Task Force on Statistical Inference (

In this study we investigated the occurrence of one specific misinterpretation in expert users of box plots, namely that the area of the box represents the frequency or proportion of observations, while it actually represents their density

Example of a box plot.

The dual process theoretical framework helps to understand why people who in theory possess the required knowledge to solve a certain problem may fail to give a correct answer due to the impact of heuristic processing of certain salient, but not necessarily relevant, problem characteristics. Showing strong relations to Fischbein’s theory of intuitions in mathematical reasoning (

An important consequence of Evans’ extended model is that even when analytic thinking takes place, reasoning is still based on — and therefore biased by — the default model and (possibly irrelevant) salient task features. This makes it possible that heuristically processed features still interfere with the analytic stage of the reasoning process and have an important influence on the final outcome of the reasoning. Even experts, who are able to correctly solve the mathematical tasks at hand, could hence still be influenced by intuitions or heuristic reasoning.

A frequently used method to study heuristic reasoning is the comparison of accuracy and reaction times in tests involving two types of items: congruent and incongruent items. For congruent items, the correct test response is the same as the response that would result from the heuristic reasoning that is hypothesised, whereas for incongruent items the correct response cannot be given on the basis of the hypothesised heuristic reasoning alone but requires additional analytic reasoning. This means that one can expect higher accuracy for congruent items than for incongruent items, while reaction times of correct responses to incongruent items can be expected to be relatively long, as more time-consuming analytic reasoning is necessary, compared to congruent items in which fast heuristic reasoning suffices.

We recently conducted two experiments that demonstrated the heuristic nature of the area misinterpretation of box plots and its occurrence among students (

Overview of the five different item types used. The task was to decide, for each pair of box plots representing the exam results of two groups of students, which group had most students with a score above 10.

In addition to box plots, many other graphical representations have been documented to be misinterpreted by students, for example in mathematics (

Tversky (

According to another of Tversky’s graph design principles, graphs should not use more dimensions than the number of variables they represent. Histograms use two dimensions and both dimensions are informative: while one dimension represents the values taken by the observed variable, the other represents the frequency with which these values have been observed. In the case of box plots, however, a two-dimensional box is shown while only the width of this box (and of the associated whiskers) is actually informative. The very salient second dimension of height may give the false impression that two variables are shown and that the height of the box is important too. This principle concerning the restriction of the number of dimensions used in a graph to the number of variables represented has also been proposed by various other authors (e.g.,

As is clear from the last two sections, the misinterpretation of box plots seems to be based on the saliency of the area of the box and is heuristic in nature. It is for this reason that we have referred to this misinterpretation as the ‘area heuristic’ throughout the present article.

Research on expertise in various domains has revealed that experts focus more on the structural principles of a task, while novices rely more on its surface features (

Applying the insights of expert research to the topic of the present study, one would expect that – unlike students – expert users of box plots are no longer hampered by the area heuristic, since it relates to a superficial feature of box plots. Owing to their extensive experience with box plots, experts would be assumed to immediately and automatically look at the correct task feature, i.e., the position of the median, which is also clearly visible in the box plot.

However, if experts are still affected by the area heuristic, they should show the same effects on congruent and incongruent items as the students in our previous experiments (

The participants were 40 students and staff of the KU Leuven who could be considered as expert users of box plots. We defined expert users of box plots as people who work with box plots on a regular basis. This was verified in participants, and they also received a box plot knowledge test which asked about some factual information regarding the key elements in box plots, such as the name of these elements (see

The accuracy and reaction time test was exactly the same as the one used in the experiments of Lem et al. (

The reaction time test was administered individually at laptop computers in a controlled environment. It started with the presentation of a box plot, naming the elements of the box plot (minimum, Q1, median, Q3, maximum) and reminding participants of the fact that the median represents the middle observation. Next, a general explanation of the task was provided, followed by two sample items for which participants did not receive feedback regarding the correctness of their responses. Finally, the task was summarised and participants were told to work at their own pace and to try to always provide the correct response.

The 40 items were provided in blocks of 10 items each, followed by a break, which participants could end by themselves by tapping the space bar. All items were preceded by a fixation cross which was presented for 500 ms. The items were presented in a semi-random order, with the following restrictions: (a) not more than three consecutive trials with the same item type, (b) not more than three consecutive trials with the same heuristic response, (c) not more than three consecutive trials with the same correct response and (d) not more than three consecutive trials with the same level of congruency. Stickers were placed on keys 9, 6 and 3 of the numerical keyboard, and participants were asked to press key 9 (with sticker reading ‘up’) when their answer was ‘top box plot’, key 6 (with sticker reading ‘=’) when the answer was ‘both the same’, or key 3 (with sticker reading ‘down’) when their answer was ‘lower box plot’. For each item, the participants’ reaction time and accuracy were logged.

Before analysing the data, the reaction times were log-transformed in order to normalise their distribution. Furthermore, all trials with a reaction time more than 2.5 standard deviations from the mean, as calculated within each level of congruency, were considered outliers and were therefore not used in the analyses, resulting in the deletion of 22 (1.57%) trials.

Because our data are clustered, or to state it differently, involve multiple measurements per participant, we opted for multilevel analyses. Multilevel models take into account the possible correlation between the different responses of a single participant (

Accuracy for the congruent items was 97.8% and it was 95.2% for the incongruent items. A generalized linear mixed model, with accuracy as dependent variable and congruency as independent variable, showed a main effect of congruency on accuracy,

Accuracy rates and number of heuristic responses per item type, in percentages.

Using a linear mixed model, the effect of congruency on reaction times for correct responses was analysed. We found a main effect of congruency, with longer reaction times for correct responses to incongruent items (4179 ms,

Lem et al. (

Comparing our results with those of Lem et al.’s (

We can conclude that even expert users of box plots are not immune to the fast, incorrect interpretation of the area of the box as representing frequency or proportion of observations. We do see, however, that expert users are better able to overcome this first heuristic interpretation by reasoning analytically. One may argue that in everyday practice, the reaction time difference observed between correct responses to congruent and incongruent items is negligible. Nevertheless, these reaction times indicate that expert users of box plots are still affected by the area heuristic and that, as a consequence, in certain circumstances – for example when under time pressure or when distracted – they may still commit heuristic errors. (This actually occurred in our experiment, where there was a lower accuracy for incongruent items as compared to congruent items). Furthermore, this study showed that this specific heuristic is very persistent; even when one is able to correctly interpret box plots, the incorrect reasoning mechanism still plays a role in the reasoning process. Two important limitations of the current study, however, are the heterogeneous composition of the group of participants and the relatively small sample size. A larger sample size (e.g., more participants in each sub group), would allow for the study of the differences between statisticians and researchers in other subjects, for example. This could be the goal of future research.

The results of this study have important implications, both with respect to the use of box plots in scientific reports and their role in statistics education. With respect to box plots in scientific reports, we can contest the advice of Wilkinson and The Task Force on Statistical Inference (

By ‘density’ we mean the relative spread of the data within each interval.

We are aware that, because we chose discrete exam scores (between 0 and 20) of a rather small group of fictional participants as context for our test, it is in principle possible, to construct data sets for the given box plots in some items used in our study that would match any of the answering alternatives by including an extreme number of ties at specific values of the variable under consideration.

Stephanie Lem holds a postdoctoral fellowship at the Research Foundation-Flanders (FWO). This research was partially supported by grant GOA 2006/01 ‘Developing adaptive expertise in mathematics education’ from the Research Fund KU Leuven, Belgium. The authors would like to thank master’s thesis student Nick Gillard for his help in the data collection process.