A further check of our coding framework was conducted by handing it off to independent coders to determine if our categories were comprehensive and our category descriptions provided adequate guidance for coding to those who were unfamiliar with the data. Inherent in this phase was selecting the sample of posts to be used for coding. Realizing that it would be impossible for a human to code each of the 96,696 posts in any timely fashion, we chose to code slightly under five percent, or 4,500 posts. In our preliminary work, we noted that the central focus of students’ posts changed as the course progressed. To capture this within the data, we divided the course into four time periods (quarters) and randomly selected 1125 posts from each quarter. We did not select posts that occurred after the end of the course, even though students continued to communicate with each other on the site well over a year after the course officially ended. We also noted that students tended to post more frequently at the beginning and end of the course than in the middle. To gain a more accurate estimate of posts described by each of our coding categories, we will weight the number of codes per category in each time period by the proportion of the total posts that occurred within that time period.

Two graduate students who were not on our original committee were recruited to code the 4500 posts. After their initial training and check-coding, these individuals would conduct the final coding of the data. The training consisted of orienting them to the codebook and discussing examples of each code. Following the training session, each coder was given 2500 posts, with 500 of the posts overlapping so that we could check for inter-coder reliability. The coders were instructed to code 32 of the overlapping posts from each of the four time periods (128 total for each coder), using codes from what we considered to be our final coding framework. Following completion of this first round of coding, we checked for inter-coder reliability by calculating percent agreement and found it to be only 59% for topic and role combined. We met with the coders to discuss difficulties in utilizing the codebook and to resolve disagreements. We made clarifications to the code definitions, and the coders then applied codes to another 128 posts. For this second round, they reached 81% agreement for topic, and 73% agreement for role of the poster. The coders then returned to the first 128 posts and re-coded, this time reaching 73% agreement for topic and 72% for role of the poster. Although there is no agreed upon threshold for appropriate inter-coder reliability (Campbell, Quincy, Osserman, & Pedersen, 2013), these percentages fall within a range considered acceptable for this type of data (Fahy, Crawford, Ally, Cookson, & Keller, 2000). Following the third round of coding and check-coding, the coders were instructed to code the remainder of their 2500 posts.