Massive open online courses (MOOCs) generate learners’ performance data that can be used to understand learners’ proficiency and to improve their efficiency. However, the approaches currently used, such as assessing the proportion of correct responses in assessments, are oversimplified and may lead to poor conclusions and decisions because they do not account for additional information on learner, content, and context. There is a need for theoretically grounded data-driven explainable educational measurement approaches for MOOCs. In this conceptual paper, we try to establish a connection between psychometrics, a scientific discipline concerned with techniques for educational and psychological measurement, and MOOCs. First, we describe general principles of traditional measurement of learners’ proficiency in education. Second, we discuss qualities of MOOCs which hamper direct application of approaches based on these general principles. Third, we discuss recent developments in measuring proficiency that may be relevant for analyzing MOOC data. Finally, we draw directions in psychometric modeling that might be interesting for future MOOC research.

Massive open online courses (MOOCs) are “one of the most significant technological developments in higher education in the past decade” (

We consider courses from the world’s largest MOOC provider (

Learners, professors, and universities – the key partners involved in MOOCs, – have an interest in accurate learners’ proficiency measuring. Learners take an online course and want to study efficiently. Proficiency measuring specifies learner’s position on the course-line, helps him/her to identify his/her strong and weak points and map areas that need additional work. Professors and their teams develop and optimize the course content. Here, the aggregated proficiency measures show to what degree the content incites learning and suggest improvements of video lectures, practical tasks, and support materials. Finally, universities award online course certificates to learners. The use of proficiency measures can provide evidence on whether and to what degree learners have mastered the course.

The learners’ proficiency is a latent construct; its measuring is a key concern of a scientific discipline within behavioral sciences – psychometrics (

The rapid development and expansion of MOOCs resulted in a growing body of related research (see the structured reviews of Bozkurt, Keskin, and de Waard (

We find imprecise or biased measures hamper the improvement and development of MOOCs and believe it is important to connect psychometric approaches and MOOC research. In this conceptual paper, we try to establish such a connection. First, we describe the principles of traditional approaches for measuring learners’ proficiency in education. Second, we discuss the qualities of MOOCs which hamper a direct application of these approaches based on the general principles in MOOCs. Third, we discuss recently developed solutions and potentially applicable approaches for measuring proficiency. Finally, we draw directions in psychometric modeling that might be interesting for future MOOC research.

There are two common theories in psychometrics – classical test theory and item response theory.

In 1888, Edgeworth suggested to decompose observed test scores into a true score and an error component (

is the most famous equation in educational and psychological measurement (_{j}_{j}_{j}_{j}_{ɛθ} = 0. Thus, the expected value of _{j}, E_{j}_{j}_{j}

The classical test model is simple for understanding which explains the high popularity of CTT among educational scholars and psychologists. At the same time, the simplicity leads to critical disadvantages. First, proficiency measures conceptualized through the test scores have a highly restricted area of generalization (

Item response theory (IRT;

the probability (π_{ij}_{j}_{j}_{i}

In comparison to CTT, person’s proficiency parameters in IRT are independent of the test difficulty. It allows comparing persons even in case of partial replacement of items. However, IRT is demanding in terms of required sample sizes to obtain stable parameter estimates. For instance, Hambleton and Jones (

The key issues which hamper the direct use of the common psychometric techniques in MOOCs are linked to understanding the concept of proficiency itself.

First, both CTT and IRT assume the proficiency does not change within a test (

Second, IRT assumes that proficiency (or a set of proficiencies) is a common cause of the learners’ responses. However, MOOC learners have a high degree of freedom because of low-stakes of such courses, especially in case of low or no integration in a curriculum. Online learners’ performance and retention are linked to a number of unrelated to knowledge emotional and motivational characteristics (

At the same time there are at least two specific issues related to the observable side, indicators of proficiency, in MOOCs. First, changes in tests are relatively frequent in MOOCs – professors often replace or add new items on the fly. This is a critical limitation for CTT as discussed above, but also induced additional complexity for using IRT as the difficulty of new items typically is not known. A second issue is that IRT requires a relatively large number of items in assessments to provide accurate proficiency measures. Kruyen, Emons, and Sijtsma (

As can be seen from the above, the common psychometric models of CTT and IRT are not tailored to use directly for measuring proficiency in MOOCs and should be tuned up accordingly.

Recently, several extensions of the Rasch model were proposed for modeling the learners’ performance in MOOCs. These include extensions for modeling the dynamics in proficiency on multiple levels (

Extending the Rasch model without running into computational issues of model overidentification became possible by the use of the reformulation of the Rasch model proposed by Van den Noortgate, De Boeck, and Meulders (

where _{ij}~Bernoulli_{ij}_{1j}_{j}_{2i}_{0}_{i}_{0}_{2i}

As we mentioned above, the key assumption of IRT is that the proficiency does not change within a test (

Abbakumov, Desmet, and Van den Noortgate (

where _{0}_{ij}_{10}_{1j}_{j}_{10} + b_{1j})*attempt_{ij} shows learner’s _{10} + b_{1j} + b_{1i})* attempt_{ij}, where

It is worth to mention that including all learners’ responses into analysis, while accounting for the number of attempts, gives an unbiased view on the proficiencies and moreover allows to study the evolution of the performance over attempts. In contrast, analysis of all responses without accounting for the number of attempts would obscure real differences between students in their proficiency, while including only the scores at the first attempt, would reduce the amount of information used and hence decrease the accuracy of estimates and the power of statistical tests. The cross-validation on the data from three MOOCs from the Coursera platform revealed 6% improvement in accuracy of predicting the correctness of learners’ responses on summative assessment items for the extended model in comparison to the traditional Rasch model (

Another type of change in learners’ proficiency in MOOCs which is not accounted for in the common psychometric models is growth through the course. Taking into account that video lectures are the central instructional tool in MOOCs (

In the proposed Rasch model extension,

where _{10}_{1j} and b_{1j} are assumed to follow univariate normal distributions, _{1j} can be considered as the initial proficiency of learner _{0j}), while the value _{1j} + (_{10} + _{1j})* _{ij}_{ij}_{ij}

The use of the extension showed that the probability of the correct response grows with every new watched lecture and the growth effect is specific for individual learners – for some learners, the growth may be intensive, while for some learners it may be almost flat through the whole course. In the cross-validation study, the quality of predicting correctness of learners’ responses on summative assessment items tested on the data from three MOOCs from the Coursera platform improves with 3.3% while using the extension in comparison to the use of original Rasch model. This fact promotes the use of extensions as a better approach in measuring the learners’ proficiency and its growth in MOOCs.

A complementary solution can be adapted from educational online games. Researchers proposed an IRT model (

where _{0} is the overall initial learners’ proficiency, _{0j} is the deviation of the initial proficiency of learner _{0}, _{ij}_{ij}_{1} and _{2} are overall population linear time trends within and between sessions respectively, and _{1j} and _{2j} are deviations of the time trends from learner _{1} and _{2} respectively. The learner-specific random effects are assumed to have a multivariate normal distribution, and _{ij}_{0} + _{0j}) + (_{1} + _{1j})*_{ij}_{2} + _{2j})*_{ij}

The first additional latent effect on the learners’ performance in MOOCs which was tested is interest (

The researchers proposed the following extension:

where _{ij}_{0} equals the estimated logit of the probability of the correct response of an average learner to an average formative assessment item incorporated into the video lecture of the course in case of very high reported interest; _{10} reflects the overall effect of interest on the expected performance, this is the expected increase of the logit when interest increases with one unit; however, the effect of interest may not be the same for all learners, thus, to model such individual differences, the researchers used a random deviation of the interest effect for learner

As a product of applying this extension an interesting finding was found where the intercept variance, this is the variance between students in the effect of proficiency, was reduced by 25% by including a random interest effect. This fact provides a more nuanced insight in the role of proficiency on the learners’ performance and confirms the importance of taking interest into account. However, there was no significant improvement in response prediction accuracy found compared to a model not taking into account interest.

In this section we highlight a set of promising directions for further development of psychometrics of MOOCs: the measurement of complex outcomes and latent constructs, the tracking learners’ progress on-the-fly, the improved understanding learners’ performance by the use of explanatory psychometric modeling approaches, the advancement in the quality of predictions by increasing the model complexity, the synergy of different psychometric methods and their combination with machine learning for precise and interpretable conclusions on learners.

Learners interact with MOOC content in different ways: they watch video lectures, read PDF assignments, discuss on forums, they attempt solving assessments. All these activities are interlaced and result in complex outcomes, for instance, a partly correct response made with a hint, after re-watching the video lecture and after discussing on the forum. To have a more nuanced view on learners’ proficiency, in this case a researcher should consider extending a model from polytomous IRT family (see Ostini & Nering (

MOOCs use not only test-based assessments. An important type of assessment is peer-reviewed assignments. In such assignments a learner’s work is generally assessed by at least three peers using a schema provided by a course professor. An important problem of such assessments is a lower precision or validity due to peers’ subjectivism (

MOOCs combine multiple domains within one course – one course may form a number of skills. Learners’ responses in assessments might be caused by a set of proficiencies, for instance, to solve a specific task in bioinformatics a learner might need a knowledge in calculus, programming, biology. Thus, understanding a single proficiency as a common cause of this response seem to be oversimplified. In this case a researcher should consider a multidimensional solution (_{j}_{k}b_{ik}θ_{jk}_{jk}_{ik}

The approaches from Equations 5 and 6 work for post-hoc measuring, not for dynamic growth tracking. The on-the-fly progress estimation, crucial for navigation and recommendations engines which decide about when to support a learner or to advance him/her through a course, could be realized by the use of the Elo Rating System (ERS; _{ij}_{ij}_{i}. This approach is widely used in learning environments (

An alternative modeling approach is Bayesian Knowledge Tracing (BKT;

There are two general psychometric approaches that might be involved in work with learners’ responses on assessment items in MOOCs. The first type is the measurement approach. This approach seeks the optimal way of locating an individual learner on the latent scale, the scale of proficiency. In other words, a researcher tries to estimate an individual learner’s proficiency as precisely as possible, and all the techniques we discussed above belong to measurement approaches.

The second type is an explanatory approach. This approach is focused on explaining learners’ responses in terms of other variables. For instance, a MOOC researcher might be interested in studying the relationship between the learners’ performance and their previous learning experience to understand the optimal way to structure this experience in order to improve the performance. For instance, recent results of Abbakumov, Desmet, and Van den Noortgate (

The explanatory movement has been started by De Boeck and Wilson (

As we saw above, tuning the common psychometric models results in improvements in the accuracy of predicting learners’ responses. However, these improvements are rather small, for instance, 3–6% (^{2}

The popular term “there is no free lunch in statistics” (

There are a number of methods in machine learning a researcher may consider to combine with psychometric approaches, for instance, tree-based methods, support vector machines, clustering. Using these methods may result in dramatic improvements in the quality of conclusions although there is no guarantee of such improvements (

Psychometrics of MOOCs is a very recent development in the field. To find an answer on the question of when and why learning does happen in MOOCs, and how these digital learning products do work, it combines a century-old heritage of psychometrics and modern sources of the logged data. It has its unique properties, such as dynamic character of learners’ proficiency and composite character of cause of learners’ performance. Although it is already showing improvements in understanding the digital learners, its future is linked to moving towards computational direction involving complex data and advanced statistical procedures into modeling multidimensional dynamic constructs and processes in MOOCs.

The authors have no competing interests to declare.