LAK Dataset

The LAK Dataset makes publicly available machine-readable versions of research sources from the Learning Analytics and Educational Data Mining communities, where the main goal is to facilitate research, analysis and smart explorative applications.



Content Partners

This site provides access to structured fulltext and metadata from key research publications in the field. This advances SoLAR’s mission, as it provides not only more comprehensive search facilities to discover relevant work in the growing corpus, but also enables researchers to analyse the field — for instance, to track the evolution of a topic over time, or to identify correlations with related communities. The ACM International Conference on Learning Analytics and Knowledge (LAK) sponsored by SoLAR is the field’s premier research forum, providing common ground for academics, administrators, software developers and companies to shape and debate the state of the art in learning analytics and related fields. The ACM conditions of providing the full text of the LAK Conference Proceedings specify:

  • ACM is providing this ACM Digital Library data solely for research purposes, gratis. Should software that is beneficial to the users of the ACM Digital Library be developed using this data, whenever feasible, ACM would appreciate an as-is perpetual royalty-free license to that software to be used by ACM solely in the context of ACM’s Digital Library services to benefit the Computer Science community.


Metadata has been extracted to create a corpus with the full text, and metadata including  authors, affiliations, titles, keywords and abstracts. The schema used to describe the papers in the dataset is based on two established schemas: the Semantic Web Conference Ontology (already used to describe metadata about publications from the Semantic Web conferences and related events) and the Linked Education schema. The data is accessible in various forms:

  1. Zipped dataset dump file for download [RDF] [NT]
  2. R format (thanks Adam Cooper) hosted on the KMi Crunch R server [LAK+EDM]
  3. Using semantic web infrastructure, a public SPARQL endpoint provides access to structured RDF metadata according to LOD principles. The endpoint is available via[your sparql query] View some example queries.

Explore the dataset

Spiral me to the core

Blue Canary


Dataset statistics

People and Organisations

  • Stefan Dietze (L3S Research Center, Germany)
  • Davide Taibi (Institute for Educational Technologies CNR, Italy)
  • Simon Buckingham Shum (SoLAR)




Since the LAK Dataset is the result of several individuals and organisations, we would like to ask all user of the dataset to include the following acknowledgements in your papers referring to the LAK Dataset: