Massachusetts Institute of Technology

More by this author

The first edX course had over 150,000 students enrolled, which included registrants from nearly every country in the world, bringing with them massive international diversity. These students were also diverse in a number of background characteristics. To augment the behavioral and geographical location data available from edX clickstream data, the authors gathered detailed individual background data for a subsample of students who completed an exit survey. Furthermore, the paper shows that student performance varies significantly with some of these background characteristics.

“Circuits and Electronics” (6.002x), which began in March 2012, was the first MOOC developed by edX, the consortium led by MIT and Harvard. Over 155,000 students initially registered for 6.002x, which was composed of video lectures, interactive problems, online laboratories, and a discussion forum. As the course ended in June 2012, researchers began to analyze the rich sources of data it generated.

The purpose of this paper is to describe the methodology used to confront one of the challenges associated with analyzing discussion forum data from the inaugural edX MOOC, “Circuits and Electronics.” We detail the development and testing of a framework to classify large amounts of MOOC data into a manageable number of categories so that further analysis can be conducted in targeted areas of interest. We discuss challenges that arose during implementation of the framework as well as how we resolved them.