We propose to explore one component of the Durham Epistemological Engine (DEE), by developing methods to automate the processes involved in a literature review.
Principal Investigators:
Professor Jim Ridgway, School of Education, jim.ridgway@durham.ac.uk
Professor Alexandra I. Cristea, Computer Science Department, alexandra.i.cristea@durham.ac.uk
Visiting Fellow:
Professor Antonija Mitrovic, University of Canterbury, Christchurch
Conceptions of knowledge, ways of knowing, and uses for knowledge, are in a state of flux. In the longer term, we envisage a Durham Epistemological Engine (DEE), whose goal is to create and use tools to engage with, and shape, the evolving knowledge landscape.
We propose to explore one component of the DEE, by developing methods to automate the processes involved in a literature review – in particular to support meta-analysis by using computer science (CS) tools such as natural language processing, deep learning and AI. We will use workshops to bring together expertise in CS and social science (SS) and will establish a network of interested researchers.
Conceptions of knowledge, ways of knowing, and uses for knowledge, are in a state of flux. There are new sources of information (e.g. from mobile devices; sensors), new methods of analysis (e.g. via AI), new producers and aggregators (e.g. Google Maps and Google Scholar), new ways to share knowledge (e.g. web-based interactive models of disease spread, social media), along with traditional ways to use and abuse information (‘enemies of the people’ and factcheckers). There have been serious challenges to the practices of academic research. Ioannidis (2005) offers a critique of published medical research, pointing to the prevalence of data dredging; The Open Science Collaboration failed to replicate 60% of the 100 ‘well-established’ results in psychology they investigated, illustrating the problems of exploring weak effects using small samples. Sample bias poses another major problem; absence of non-Europeans from large data bases (used for genome-therapy decisions) and massive under-representation of women in clinical trials lead to dangerous practices. There is an urgent need to rethink epistemological assumptions across academia, and beyond.
In the longer term, we envisage a Durham Epistemological Engine (DEE), whose goal is to create and use tools to engage with, and shape, the evolving knowledge landscape. Here we will sketch just that part of the DEE relevant to academic research. The ambition here is to analyse, critique, and improve a slew of processes associated with knowledge generation. These will include: identification of academic areas that are paradigm-bound (e.g. which use a narrow range of methods, data types or models); methods to identify epistemological assumptions; critical evaluation of specific studies; creation of semantic networks of papers and authors; methodology classification systems; identification of areas that do not share data and code, routinely; identification of results that are important for theoretical claims, but where the evidence is weak; analogy generators; and methods for analysing large corpora of research. This proposal represents the beginning of this work; we propose to automate much of the work associated with the conduct of literature reviewing.
An interdisciplinary approach is essential for the development of the DEE – technical expertise needs to be combined with the domain-specific knowledge required to articulate epistemological assumptions (and to promote change within disciplines).
Every researcher needs to have a sound knowledge of well-established effects in their area. This can be difficult in fast changing fields, any interdisciplinary work, or for researchers entering new domains – such as doctoral students.
Advice is available on ways to evaluate the quality of research based on different paradigms, such as systematic reviews, randomised controlled trials, or qualitative studies[1].
Here, we propose to explore methods to automate review processes – in particular to support meta-analysis, by using computer science (CS) tools such as natural language processing, deep learning and AI. We will bring together expertise in CS and social science (SS) and will establish a network of interested researchers.
Current state of the art is limited; for example, searching for ‘Automatic paper reviews’ on Google Scholar does not return many papers related directly to using text analysis for reviewing a large body of research papers, with most ‘reviews’ considering various product reviews instead. A recent somewhat related paper is the one by Choong et al., 2014, where citations were retrieved automatically from a body of references, and compared with manually created ones. Work by Di Nunzio, 2018, looks at automatic retrieval of relevant studies – which is useful for the start of the problem we are tackling (i.e., establishing the original pool of data to be used for analysis and automatic extraction of e.g., trends, future research, main results, etc.). In health sciences, for instance, tools to support systematic literature reviews are proposed (e.g. Ruiz-Rube, 2018), based on recent cloud technology; this shows the interest of the area across disciplines, but also that not much has been done in the direction of natural language processing (NLP) towards processing the ever growing body of literature.
Meta-analysis is a method for synthesising results from studies conducted by different researchers on the same topic. Drug trials provide an example[2]. Repositories of meta-analyses have been created[3]. These libraries are used to inform practice[4]. For academics (including research students) it would be valuable to have efficient tools to support the process of literature review.
We propose a series of 5 Workshops, which begin with structured gathering and creation of ideas by Durham academics, followed by implementation and evaluation, which become more focussed and inclusive (e.g., across Durham departments, then N8) over time, and culminate in a UK-wide conference where ideas and outcomes are presented and evaluated. These will be complemented by invited talks to the Innovative Computing Group and to Durham Digital Humanities and Durham Digital Health events. We propose to involve postgraduate CS students heavily in this work; they will be highly motivated to succeed, and we will be able to explore the potential of a variety of techniques, in parallel.
References
Alexandra Bannach-Brown, Piotr Przybyła, James Thomas, Andrew S. C. Rice, Sophia Ananiadou, Jing Liao & Malcolm Robert Macleod, Machine learning algorithms for systematic review: reducing workload in a preclinical review of animal studies and reducing human screening error, Systematic Reviews volume 8, Article number: 23 (2019)
Choong MK, Galgani F, Dunn AG, Tsafnat G, Automatic Evidence Retrieval for Systematic Reviews, J Med Internet Res 2014;16(10):e223
Di Nunzio G.M. (2018) A Study of an Automatic Stopping Strategy for Technologically Assisted Medical Reviews. In: Pasi G., Piwowarski B., Azzopardi L., Hanbury A. (eds) Advances in Information Retrieval. ECIR 2018. Lecture Notes in Computer Science, vol 10772. Springer, Cham
Ruiz-Rube I., Person T., Mota J.M., Dodero J.M., González-Toro Á.R. (2018) Evidence-Based Systematic Literature Reviews in the Cloud. In: Yin H., Camacho D., Novais P., Tallón-Ballesteros A. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2018. IDEAL 2018. Lecture Notes in Computer Science, vol 11315. Springer, Cham
Shape
[1] E.g. https://casp-uk.net/casp-tools-checklists/.
[2] How effective is drug X? A meta analysis begins with a clear specification. For example we want to evaluate the effectiveness of drug X on treating children aged 0-3 suffering from Condition Q. There must be random allocation of children to the treatment group and non-treatment group (non-treatment must involve a placebo). Treatment must last between 2-3 weeks. Assessment of outcomes must be done by suitably qualified persons who do not know whether the patient was in the treatment group or the placebo group. An attempt is made to locate all studies which satisfy these criteria, from peer-reviewed journals. Results are then synthesised, weighting studies on larger numbers of patients more heavily.
[3] E.g. the Cochrane Library for health studies, and the Campbell Collaboration for social science.
[4] E.g. by NICE in licencing drugs for use in the UK, and as a resource for educators – exemplified by Durham’s Teaching Effectiveness Toolkit that shows the effectiveness and cost of different interventions, and the strength of the supporting evidence.
The third project workshop.