The effect of large-scale datafication of archival materials
The effect of large-scale datafication of archival materials
A new research project at the Danish National Archives and the University of Copenhagen investigates how our understanding of the past changes when historical archives are transformed into digital data.
From documents to data
For many years, millions of historical documents have been transcribed and compiled into large databases with the help of volunteers, researchers, and, in recent years, artificial intelligence. The result is extensive historical datasets that form the basis for new research on population, family, labour, and social conditions. But this data does not create itself, and the methods used are not neutral. Instead, they shape the kinds of histories we are later able to tell on the basis of the data.
With the research project Missing, Distorted or Normalised? The Effect of Large-Scale Datafication of Archival Materials (MiDiN-DatA), researchers from the Danish National Archives and the University of Copenhagen examine how different approaches to working with historical data shape future historical narratives.
The project is funded by the Augustinus Foundation with DKK 4.7 million.
Humans, machines, and their interaction
The first two parts of the project investigate how historical data is shaped by the methods used to create it. The project analyses purely manual methods, highly automated methods based on AI technologies, and hybrid approaches that combine human judgement with algorithms.
The aim is to understand which types of information risk being lost, distorted, or normalised depending on the chosen method, and how errors and biases may accumulate—or in some cases be reduced. The project also examines where human insight is essential, where automation is advantageous, and how the interplay between the two can be used more deliberately.
Humans as creators and users of data
The third part of the project focuses on the role of humans in the datafication of archives—both as creators and users of large historical datasets, with particular attention to crowdsourcing projects over the past three decades.
Humans contribute experience, interpretation, and contextual understanding, for example through transcription of sources or the development of training data for automated methods. At the same time, human contributions can also introduce sources of error and bias that affect data quality and usability.
The project analyses these issues by examining variation in human work with data and by studying how the understanding and use of historical datasets are influenced when data is created through large and complex datafication processes.
From research to practice
The project is based on the collections of the Danish National Archives as well as a number of ongoing and completed projects and infrastructures, including Link-Lives, Mapping Freedom, the National Archives’ crowdsourcing portal, and the recently funded projects Historical Person Register (HisPeR) and ChildHomes.
Grounding the project in concrete sources and research questions makes it possible to analyse how datafication methods affect empirical historical research in practice—for example in studies of child mortality, migration, and social mobility. At the same time, the connection to existing infrastructures ensures that the project’s results can be continuously tested and integrated directly into the National Archives’ work with large-scale historical data.
About the project
- The project is led by Senior Researcher Bárbara Revuelta-Eugercios at the Danish National Archives together with an internal research team and is carried out in collaboration with researchers at the University of Copenhagen.
- The research project runs from 2026 to 2029.
- It is funded by the Augustinus Foundation with DKK 4.7 million.
- The results will be disseminated through scientific publications, events, and teaching activities.
Researchers participating in the project
Danish National Archives
- Bárbara Revuelta-Eugercios, Senior Researcher and Project Leader, specialist in historical demography and health inequality
- Asbjørn Thomsen, Senior Researcher, focusing on social mobility and rural societies
- Olivia Robinson, Historical Data Manager, focusing on migration and colonial history
- Tobias Kallehauge, Data Scientist, focusing on machine learning and automated methods
- Markus Schunk, Crowdsourcing Coordinator, focusing on user engagement and data quality
University of Copenhagen
- Henriette Roued, Associate Professor of Digital Humanities at the Department of Communication, specialist in digital cultural heritage, the GLAM sector, and citizen-created digital heritage
- Anne Løkke, Professor at the SAXO Institute, focusing on social, cultural, medical, and health history, c. 1750–1950