The purpose of this paper was to describe the methodology used to confront the challenges of analyzing discussion forum data from the first edX MOOC, “Circuits and Electronics.” The open, relatively unstructured nature of the 6.002x discussion forum resulted in student interactions that were extremely diverse with respect to content as well as nature of the interaction. Our efforts to answer several overarching research questions about the benefits of the discussion forum participation toward positive student outcomes began by answering the more specific question of what types of information students’ actually posted in this space and what role or roles they assumed in these exchanges. We reviewed current recommendations for research into computer-supported collaborative learning and ascertained that our research questions addressed current needs in this area. We then developed a framework to classify copious amounts of data into a manageable number of categories so that further analysis could be conducted in targeted areas of interest. We detailed our methodology for development and testing of this coding framework and discussed challenges that arose during its implementation.

The results of our efforts enable us to provide an empirical account of how students actually use the discussion forum in a MOOC—information that may assist developers as they strive to enhance positive outcomes from students’ use of this medium. Additionally, our work provides a structure that facilitates easy identification of data integral to our future research. Our next steps will be to explore characteristics of productive dialogue between students in this space, using a more in-depth coding schema to analyze students’ posts. For this work, we intend to analyze entire threads, which will allow examination of a complete ‘conversation’ between students and will help to alleviate our earlier difficulties in coding ambiguous posts. We will also explore the relationships between the subject matter of students’ posts, their role within those posts, and their achievement or persistence in the course.

We believe our framework is generalizable to other MOOC discussion forums, although modifications to the current framework may be desirable dependent upon the results of our future analysis. For example, if we find that the code ‘references to courses other than 6.002x’ has little association with any student outcome of interest, this code could be eliminated to reduce complexity of the coding process. As we probe more deeply into specific categories of posts, we may find it advantageous to develop subcategories for codes such as ‘social/affective,’ as the posts that contain social overtures may be associated with different student outcomes than those expressing positive or negative emotion. We hope that providing this level of transparency to our process of development will enable others to benefit from our experiences and build on our work. Additionally, this framework can provide the foundation for the future utilization of natural language processing to code discussion forum data.