This is my daily life log.

Why is Educational Data Mining important in the research?

*This is the same article I wrote in medium.

In recent years, increasing attention to Artificial Intelligence (AI) encouraged the progress of data mining and analytics in the pedagogical domain. (Baker 2014) Data mining is the process to extract new aspects and patterns from a large data set using the methods at the crossing of machine learning, statistics, and database systems. It is also a field of knowledge discovery in databases (KDD), which is the area of discovering the distinct and potentially beneficial information from large amounts of data set. (Fayyad et al. 1996) The data mining specializing in the educational domain is called as Educational Data Mining (EDM). EDM refers to the techniques, tools, and research designs utilized to obtain information from educational records, typically online logs, and examination results, and then analyses this information to formulate conclusions. EDM is theory-oriented and focuses on the connection to pedagogical theory (Berland et al., 2014). Presently, little empirical evidence exists to support a theoretical framework able to gain wide acceptance in the scientific community (Papamitsiou & Economides, 2014). Given that in the real world there is a great diversity of different learning contexts, they determine the analytical approaches utilized by EDM. Therefore, how EDM can be beneficial in real educational practices, as demonstrated in the research, could be crucial.

What is educational data mining?

With the establishment of The annual International Conference on Educational Data Mining and the Journal of Educational Data Mining in 2008, EDM emerged as a credible research area (Baker et al., 2010). The International Educational Data Mining Society, which hosts the International Conference on Educational Data Mining and publishes the Journal of Educational Data Mining, offers this definition of EDM:

“Educational Data Mining is an emerging discipline, concerned with developing methods for exploring the unique and increasingly large-scale data obtained from educational settings, and uses those methods to better understand students and the settings in which they learn” (International Educational Data Mining Society, 2011).

According to the International Educational Data Mining Society (2011), information in any learning context is often comprised of multiple hierarchical levels, which cannot be determined in advance but must be verified by properties found in the data. Factors, such as time, sequence, and context are also important to consider in the study of educational data. For instance, students’ learning behaviors (students’ participation, login frequency, number of chat messages, and the type of questions submitted to instructor) along with their final grades can be analyzed. (Abdous et al. 2012)

The online learning platform used for recording the data and analyzing them determine what information about learning behaviors can be recorded. If the database of online learning platform did not have the properties or variables about time, the researchers cannot analyze the timing when students finish the exam. Hence, the information that EDM can deal with relies on the nature of data predetermined by the online learning platform. In fact, learning designs with online learning platform in the real world are diverged and still developing due to the progress of machine learning techniques. Therefore, the updated study with educational data mining could be beneficial in the EDM research area.

Learning analytics vs Educational data mining

The meaning of EDM is not clear for researchers to employ EDM approaches because EDM is closely tied to the research field of Learning Analytics (LA).

Learning analytics is the measurement, collection, analysis, and reporting of data about learners and their contexts, for purposes of understanding and optimizing learning and the environments in which it occurs.(International Conference on Learning Analytics and Knowledge et al. 2016)

LA is a fairly new field that is gaining increasing popularity. Generally, LA is on the two premises: that LA deals with pre-existing data in a form that a computer can process, and that its techniques can be acceptable to handle large sets of data that would not handle manually. (Ferguson 2012) LA and EDM research communities hold different perspectives on how educational data should be obtained and analyzed (Berland et al., 2014). Researchers in LA research employ more human-led methods of discovery, and focuses more on holistic systems and understanding constructs, and then seeks ways to inform and empower instructors and learners; for example, informing an instructor about ways a specific student is struggling, so the instructor can contact the learner and intervene in a positive manner to facilitate that student’s learning (Berland et al., 2014).

On the other hand, EDM researchers place their focus more on utilizing automated methods for discovery within educational data, modelling specific constructs and the relationships between them, applications in automated adaptation, such as supporting a learner’s experience through having an educational software identify and automatically changing it to personalize the learner’s experience (Berland et al., 2014; Arroyo et al., 2007; Baker et al., 2006; Corbett & Anderson, 1995).

Objectives and stakeholders of using EDM

Although EDM and LA have slightly different characteristics, the influential factors that change the quantity of data are the same due to relying on the secondary data that is generated by online learning technologies. The advent of new technologies used and public data repositories in pedagogical contexts increase the quantity of data and diverge the quality of data. For instance, mobile devices can enable the researchers to capture the interaction of learners in more detailed. (Berland et al. 2014) Research intuitions open public archives like the Pittsburgh Science of Learning Center DataShop, which include a huge amount of quantities of data, and are accessible to any scientific researchers. (Koedinger et al. 2010) Thus, the characteristics of EDM and LA techniques can also be easily influenced by the new technologies used in the educational context. (Marie Bienkowski et al., 2012) Although it seems to be meaningless to strictly distinguish the meanings of EDM and LA, In terms of the taxonomy of objectives and stakeholders of using EDM, Baker, and Yacef (2009)’s and Romero and Ventura (2010)’ s described like the below. Baker and Yacef (2009, pp.5–8) summarized the following the four goals of EDM:

  1. Predicting learner’s behaviors by improving student models. Modeling is characterizing and categorizing a student’s characteristics or states that make up the student’s knowledge, motivation, meta-cognition, and attitudes.
  2. Discovering or improving knowledge domain structure models. For example, there are concept models of the materials being taught and models that explain the interrelationships of knowledge in a domain (Barnes, 2005).
  3. Studying the most effective pedagogical support for student learning that can be achieved through learning systems.
  4. Establishing empirical evidence to support or articulate pedagogical theories, frameworks, and educational phenomena to determine core influential components of learning to enable the designing of better learning systems.

EDM goals are achieved by adapting psychometrics, employing statistical techniques, and mining log data stored in offline educational settings, including face-to-face contacts, studying the psychology of how humans learn, participating in online learning obtained from E-learning and Learning Management System (LMS), and using Intelligent Tutoring System (ITS) (Romero & Ventura 2010).

In addition, information used in EDM is oriented towards several stakeholders (Liñán & Pérez 2015). Different groups of stakeholders review educational information from various perspectives, obeying their own mission, vision, and objectives for using EDM (Hanna, 2004). Romero and Ventura (2010, p. 2) categorize the four stakeholders according to their objectives for using EDM:

  1. Learners: Optimizing individual learning styles, learning materials, and learning experiences, or recommending them.
  2. Educators: Analysing students’ learning behaviors, gaining the most supportive instruction, and predicting student learning to increase teaching effectiveness.
  3. Researchers/Developers: Evaluating learning materials, improving learning systems, and assessing data mining techniques for effectiveness.
  4. Organizations: Improving decision-making processes in higher learning institutions in terms of efficiency and cost, such as admission processes and financial resources distribution.
Reload again, if you are not able to see images.
Figure 1 created by Jesse Tetsuya


There are a few publications that mentioned the practical and empirical studies examining the above taxonomies although they were theoretically constructed. For instance, according to a systemic review of LA and EDM in practices by Papamitsiou and Economides (2014), only 209 mature pieces of the research work, which were identified before excluding the theoretical studies, limited the key empirical studies to 40, including all objectives of using EDM.

Furthermore, the empirical study about EDM has the needs in pedagogical industry, especially higher education. This is because higher educational institutions have a large set of data enough to conduct the analysis. (Kollias et al., 2005) although educators are not aware of how to conduct EDM in their own practice. They might also not know how to use the latest technology and why it is important due to a lack of technological training. (Selwyn, 2011) The complicated and new machine learning techniques, which are the main analysis techniques of EDM, might puzzle educators.


Abdous, M., Wu, H. & Yen, C.-J.,(2012). Using data mining for predicting relationships between online question theme and final grade. Journal of Educational Technology & Society, 15(3), p.77.

Baker, R. & de Carvalho, A.,(2008). Labeling student behavior faster and more precisely with text replays. In Educational Data Mining 2008. Available at: [Accessed August 30, 2017].

Baker, R. & others,(2010). Data mining for education. International encyclopedia of education, 7(3), pp.112–118.

Baker, R.S.,(2014). Educational data mining: An advance for intelligent systems in education. IEEE Intelligent systems, 29(3), pp.78–82.

Baker, R.S.J. d & Yacef, K.,(2009). The State of Educational Data Mining in 2009: A Review and Future Visions. JEDM — Journal of Educational Data Mining, 1(1), pp.3–17.

Berland, M., Ryan|Blikstein,Paulo,(2014). Educational Data Mining and Learning Analytics: Applications to Constructionist Research. Technology, Knowledge and Learning, 19(1–2), pp.205–220.

Fayyad, U., Piatetsky-Shapiro, G. & Smyth, P.,(1996). From data mining to knowledge discovery in databases. AI magazine, 17(3), p.37.

Ferguson, R.,(2012). Learning analytics: drivers, developments and challenges. International Journal of Technology Enhanced Learning, 4(5/6), p.304.

Feyyad, U.M.,(1996). Data mining and knowledge discovery: making sense out of data. IEEE Expert, 11(5), pp.20–25.

International Educational Data Mining Society,(2011). International Educational Data Mining Society. International Educational Data Mining Society. Available at: /home [Accessed June 26, 2017].

Hanna, M.,(2004). Data mining in the e‐learning domain. Campus-Wide Information Systems, 21(1), pp.29–34.

Koedinger, K.R., D’Mello, S., McLaughlin, E.A., Pardos, Z.A. & Rosé, C.P.,(2015b). Data mining and education. Wiley Interdisciplinary Reviews: Cognitive Science, 6(4), pp.333–353.

Liñán, L.C. & Pérez, Á.A.J.,(2015). Educational Data Mining and Learning Analytics: differences, similarities, and time evolution. RUSC. Universities and Knowledge Society Journal, 12(3), pp.98–112.

Papamitsiou, Z. & Economides, A.A.,(2014). Learning analytics and educational data mining in practice: A systematic literature review of empirical evidence. Journal of Educational Technology & Society, 17(4), p.49.

Romero, C. & Ventura, S.,(2010). Educational Data Mining: A Review of the State of the Art. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 40(6), pp.601–618.

Selwyn, Neil. (2011). Education and technology : key issues and debates. London ; New York, NY : Bloomsbury Academic, an imprint of Bloomsbury Publishing Plc