This is my daily life log.

Potential Issues of the Research with Educational Data Mining

*This is the same article I wrote in medium.

The aim of doing educational data mining (EDM) is identifying how students learn online in “blended” module by doing the research with machine learning techniques based on the log data stored in the MOODLE database. The results can be useful for improving teaching methods and testing hypotheses about learning and could implement the software to enable them to be performed routinely. However, the potential risks and limitations in the research methodology seem to be unclear. This article will discuss the arguments in the blow through looking over the case studies with blended online learning by using MOODLE data set.

The research with EDM can be influenced by the 1) learning contexts, 2) online learning platform, and 3) scientific paradigm as described by figure 1.

Reload again, if you are not able to see images.
Figure 1 created by Jesse Tetsuya

Working Definitions

The case study with EDM under the blended learning contexts with Moodle mainly uses machine-learning techniques. The definitions of both concepts are like the following sentences.

Machine learning, considered a subfield of computer science, gives “computers the ability to learn without being explicitly programmed” (Munoz, 2014, p.1) The machine learning is based on the distribution-free statistic, which consisted of unsupervised learning and supervised learning techniques. The differences between them are whether the researchers prepare the labeled training data or not. The unsupervised learning techniques do not need the labeled training data, while they cannot gain the decisive evidence. On the other hand, the supervised learning techniques have to spend the time to label the training data and instead generate the decisive results. (Raschka et al. 2016)

Moodle is a free and open-source-based learning platform referred to as one of the types of LMS or E-learning, which is coordinated by Moodle HQ, an Australian company of 30 developers financially supported by a network of over 60 Moodle Partner service companies worldwide (Moodle 2016). Moodle is used in Higher Education (HE) as one of the most common educational technology programs (Bowler 2009), and used for blended learning, distance education, flipped classroom, and other e-learning projects in schools, universities, workplaces, and other sectors (Costello 2013). Moodle provides broader flexibility for students in obtaining resources and increases engagement in tasks, which can result in better learning performances (Kirkwood & Price 2014). Moodle can enable students to do classroom-learning actions, such as inquiry, acquisition, practice, discussion, production, and collaboration online and then share the output with others.

Three Research Question Examples

The stakeholders of EDM in the case study could mostly be educators and researchers. The goal that the educator has is to identify learner’s behaviors in order to enhance learning performance. The researcher’s goal is to identify how EDM approaches using machine learning techniques should be conducted. In order to meet the educator’s goal, the useful indicators to measure the learning outcomes should be identified in advance, and recognize the online learning behaviors. Through these processes, the researchers can meet the goal for educators and themselves. The case study examines the three following the simple questions, and by arguing them discovers the application layer of EDM.

RQ1. What can be indicators to predict exam outcomes?

RQ2. What can be a common student’s learning behaviors on Moodle?

RQ3. What can the results from EDM reveal to educators and researchers?

Collaboration Indicators of Blended Learning

Learning behaviors in the blended learning contexts, which EDM was adjusted to in the study for the above research questions, can be represented on the log data. Laurillard (2014) defined “blended learning” as

“the thoughtful integration of classroom face-to-face learning experiences with online learning experiences.”

Blended learning is simple in that there is a considerable intuitive appeal in integrating the strengths of synchronous (face-to-face) and asynchronous (text-based Internet) learning activities. Blended learning has considerably complex in its implementation due to the challenge of virtually unlimited design possibilities and its applicability to so many contexts (Garrison & Kanuka, 2004). A large amount of log data is recorded and have various variables in blended learning contexts because blended learning leads to an increase in collaboration, and access frequency and flexibility (Graham, 2006). Accordingly, the studies using EDM are usually adjusted to the blended learning to identify the characteristics of online learning behaviors. (Berland et al. 2014; McLaren et al., 2007, 2010)

The flexibilities of blended learning can enable multiple collaborative learning contexts or activities. This can increase commitments to online learning and the chances to learn online. Correspondingly, the large amount of log data can be accumulated and the distinct characteristics of online learning behaviors can be extracted from the log data. For instance, there is the blended learning with

“online collaboration using software development tools” (Berland et al., 2014, p. 12; Kay et al., 2006)” and “interactive tabletop collaboration” (Berland et al., 2014, p. 12; Martinez et al., 2011).

Prata (2012) noted that students, contradicted by their partners when they were incorrect, tended to learn more than students whose partners chose not to correct them. Dyke et al. (2013) revealed that off-topic discussions that took place during collaborative learning were more harmful to learning within some parts of the learning process than during other parts — specifically, off-topic discussions were more harmful when learning basic facts than during discussions of problem-solving alternatives. Some models, based on student contributions to online discussion forums, have even been able to predict the final course grades of the students. (Ming & Ming 2012) Therefore, in the collaborative blended learning contexts, outstanding traits can be reflected on the learning data and the valuable results can be extracted with EDM.

However, there are also difficulties as to how knowledge is represented in online learning behaviors that have the characteristic of a reasoning action in representing the world (Davis et al., 1993), while the learning flexibility provided by blended learning methods can combine various delivery modes (Oliver & Trigwell, 2005) and extend learning experiences. This is because the log data cannot be recorded without access on the Internet, and the blended learning partly involves

“self-paced learning with instructor support to develop specific knowledge and skills”. (Oliver & Trigwell 2005, p.2; Valiathan, 2002)

Considering the individually different learning pace and individual pre-existing knowledge, learning behavior is differently represented or not represented. In addition, the technology used in the online learning environment is continually developing and changing (Laurillard, 2014). Therefore, the results that can be extracted from blended contexts using EDM are contextual. The empirical study in blended contexts using EDM need to be updated. The results could also be influenced by the characteristics of online learning platforms based on new technology as well as the learning contexts. The next section will identify the effect of the platform on the research.

Limited Indicators of Moodle

The online learning platform examined in the study, which is Moodle, has functional characteristics such as customizable structure and learning contents protocols. It restricts what the researchers can measure, and force the researches to conduct analysis in a certain way. EDM cannot measure the influence on student’s motivation due to a customizable structure of Moodle. This was because individual motivation can be influenced by various elements, such as internet speed, the design of online learning contents, the user interface (Beluce & Oliveira 2015), and the hidden elements constituted by relationships among learners that cannot be generated on Moodle. (Selwyn 2011)

Although the trait of Moodle can provide learning environments more flexibility, it enables the developers, designers, and educators to create different learning platforms using Moodle that have various design and functions depending on stakeholders. This means that the universal learning platform using Moodle does not exist. This can lead to the difficulties to measure the impact from the learning platform itself such as motivation by using EDM approaches. For instance, when students seldom learn by using learning materials uploaded by educators, it is difficult for the researchers to understand whether the learning materials or the designs of the interface decrease learner’s motivation to use the learning materials, comparing other studies using Moodle.

The customizable traits of Moodle require the well-designed learning user interface and structure of Moodle. In an actual educational situation, most educational institutions do not utilize Moodle well to improve learning performance. (Bowler, 2009) For example, educators upload many learning materials into Moodle without considering which learning design will best motivate individuals to participate in online actions, thus students seldom learn by using these materials (Attwell & Hughes, 2010; Finlayson et al., 2006). The diversity of the learning design with Moodle that came from its customizable structure can obscure the impact factors when analyzing the data by using EDM.

In addition, the protocols that Moodle is compatible with limit the variables of the log data. The capacity for learning materials such as documents, slides, and movies are limited by globally-decided protocols, such as the Sharable Content Object Reference Model (SCORM) that is adapted in Moodle, Tincan API, that is more control over learning content and more secure than SCORM content (Rustici Software, 2016), and Caliper, which can record more detailed learning actions than SCORM (IMS Global Learning Consortium Inc., 2016). As such, stakeholders of the digital content, such as developers, designers of the digital contents, educators, and schools, are forced to follow these protocols and the digital content is developed based on the framework. It is true that individual learning performances and processes on the chunked digital learning contents, which can be measured as quantitative changes in learning, are recorded in the database, and the technical property of the digital content can make it possible to personalize the learning contents on Moodle (Laurillard, 2014). However, Moodle that has the lecture movies with SCORM protocol does not record the time when students finished watching the movies, and the design framework of learning materials are predetermined.

Besides the limitation by using Moodle, the slight differences between the roles of teachers online and in classrooms in regard to their roles and responsibilities (McDonald & Reushle, 2002; Oliver, 1999) differentiates the impact on online learning. In the blended learning environments with Moodle, teachers can play an integrated role, such as facilitating discussions or acting as instructional designers, which is different from their authoritative role in traditional classroom teaching environments (Selwyn, 2011). Moodle is also compatible with collaborative learning since they allow discussions to be recorded and tracked, and keep the group bounded (Easton, 2003). In the learning contexts, teachers are required for their social role and take on responsibilities, such as organizing learning communities (Easton, 2003), and encouraging learners to make communities more active (Craig et al., 2009). They can have roles as ‘instructional designers’, who design the learning processes and content, as well as ‘facilitators’ of discussion forums. Thus, the results of EDM could be directly useful for teachers to determine how they should behave online as well as analyze online learning behaviors on Moodle. Although functional aspects of Moodle limit the log data, the properties of the log data can be influenced by the role of educators such as learning designer or facilitators.

EDM is based on the Social Science Paradigm

The research with EDM is on the premise that every element relating to learning activities cannot always represent itself. This is because EDM is not based on the traits of natural science, but on social science, although data mining is a branch of computer science.

The research targets of natural science are

“naturally occurring objects or phenomena such as light, objects, matter, earth, celestial bodies, or the human body” (Bhattacherjee et al., 2012, p. 10)

whereas, social science aims at researching

“people or collections of people, such as groups, firms, societies, or economies, and their individual or collective behaviors” (Bhattacherjee et al., 2012, p. 10).

The tradition of social science is different from that of natural science, such as being “very precise, accurate, deterministic, and independent of the person making the scientific observations” (Bhattacherjee et al., 2012, p. 1). Bovo et al. (2013)’s study about clustering Moodle data for profiling students provided very little differences in behaviors, and generate little clusters (2 to 3). Thus, EDM approaches do not always found out beneficial information even if the analysis methods were correctly adapted.

The research with EDM does not assume that every phenomenon in educational contexts can be transformed into static numerical variables that underpin “scientism” (Hyslop-Margison & Nasee, 2007).

Consequently, a rich and detailed description of research contexts is needed in order to trace why the research could not gain significant results or identify the extent to which the findings can have transferability to other educational settings. (Bhattacherjee et al. 2012) This is because EDM approaches have voluntaristic traits to direct the researcher’s attitude and understanding of that data. This is due to the iterative processes of hypothesis formation, testing, and refinement (Romero & Ventura, 2013), which is an iterative fashion of the data mining process itself, especially machine learning techniques. The detailed description about iterative processes of generating hypotheses can increase the credibility of the research.


Anon, Laurillard — 2014 — Thinking about blended learning.pdf. Available at:

Attwell, G. & Hughes, J.,(2010). Pedagogic approaches to using technology for learning: Literature review. Available at: .

Beluce, A.C. & Oliveira, K.L. de,(2015). Students’ Motivation for Learning in Virtual Learning Environments. Paideía; São Paulo, 25(60), pp.105–113.

Berland, M., Ryan|Blikstein,Paulo,(2014). Educational Data Mining and Learning Analytics: Applications to Constructionist Research. Technology, Knowledge and Learning, 19(1–2), pp.205–220.

Bhattacherjee, A., University of South Florida, Scholar Commons & Open Textbook Library,(2012). Social science research: principles, methods, and practices, Available at:

Bowler, M.,(2009). Learning to’chat’in a virtual learning environment: Using online synchronous discussion to conduct a first year undergraduate tutorial. In British Educational Research Association Annual Conference, University of Manchester. pp. 2–5. Available at: .

Craig, A., Goold, A., Coldwell, J., Mustard, J. & Jerry, P.,(2009). Perceptions of roles and responsibilities in online learning: A case study. International Journal of Doctoral Studies, 4, pp.205–223.

Costello, E.,(2013). Opening up to open source: looking at how Moodle was adopted in higher education. Open Learning: The Journal of Open, Distance and e-Learning, 28(3), pp.187–200.

Dyke, G., Howley, I., Adamson, D., Kumar, R. & Rosé, C.P.,(2013). Towards Academically Productive Talk Supported by Conversational Agents. In Productive Multivocality in the Analysis of Group Interactions. Computer-Supported Collaborative Learning Series. Springer, Boston, MA, pp. 459–476. Available at:

Easton, S.S.,(2003). Clarifying the instructor’s role in online distance learning. Communication Education, 52(2), pp.87–105.

Garrison, D.R. & Kanuka, H.,(2004). Blended learning: Uncovering its transformative potential in higher education. The Internet and Higher Education, 7(2), pp.95–105.

Graham, C.R.,(2006). Blended learning systems. The handbook of blended learning, pp.3–21.

IMS Global Learning Consortium Inc., | IMS Global Learning Consortium. Available at: /activity/caliperram.

International Educational Data Mining Society,(2011). International Educational Data Mining Society. International Educational Data Mining Society. Available at: /home.

Kirkwood, Adrian and Price, Linda (2014). Technology-enhanced learning and teaching in higher education: what is ‘enhanced’ and how do we know? A critical literature review. Learning, Media and Technology, 39(1) pp. 6–36.

McDonald, J. & Reushle, S.,(2002). Charting the role of the online teacher in higher education: Winds of change. In Proceedings ASCILITE 2002: 19th Annual Conference of the Australasian Society for Computers in Learning in Tertiary Education. Australasian Society for Computers in Learning in Tertiary Education (ASCILITE), pp. 431–440. Available at:

Munoz, A.,(2014). Machine Learning and Optimization. URL: https://www. cims. nyu. edu/ munoz/files/ml_optimization. pdf [accessed 2016–03–02][WebCite Cache ID 6fiLfZvnG]. Available at:

Oliver, M. & Trigwell, K.,(2005). Can ‘blended learning’be redeemed? E-learning and Digital Media, 2(1), pp.17–26.

Prata, D.,(2012). Dialogue Analysis in Collaborative Learning. International Journal of e-Education, e-Business, e-Management and e-Learning. Available at:

Raschka, S., Julian, D. & Hearty, J.,(2016). Python: Deeper Insights into Machine Learning, Packt Publishing Ltd.

Romero, C. & Ventura, S.,(2010). Educational Data Mining: A Review of the State of the Art. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 40(6), pp.601–618.

Rustici Software, LLC, Overview — Tin Can API. Available at:

Selwyn, Neil. (2011). Education and technology : key issues and debates. London ; New York, NY : Bloomsbury Academic, an imprint of Bloomsbury Publishing Plc