• KSEBM
  • Contact us
  • E-Submission
ABOUT
BROWSE ARTICLES
EDITORIAL POLICY
FOR CONTRIBUTORS

Articles

Review

Artificial intelligence assisted semi-automation tools using for systematic reviews and guideline development

J Evid-Based Pract 2025;1(2):62-67. Published online: September 29, 2025

Division of Healthcare Research, National Evidence-based Healthcare Collaborating Agency, Seoul, Korea

Corresponding author: Miyoung Choi E-mail: mychoi@neca.re.kr
• Received: August 3, 2025   • Revised: August 30, 2025   • Accepted: September 2, 2025

© Korean Society of Evidence-Based Medicine, 2025

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

  • 93 Views
  • 2 Download
prev
  • This review explores the current landscape of artificial intelligence (AI)-assisted semi-automation tools used in systematic reviews and guideline development. With the exponential growth of medical literature, these tools have emerged to improve efficiency and reduce the workload involved in evidence synthesis. Platforms such as Covidence, EPPI-Reviewer, DistillerSR, and Laser AI exemplify how machine learning and, more recently, large language models (LLMs) are being integrated into key stages of the systematic review process—ranging from literature screening to data extraction. Evidence suggests that these tools can save considerable time, with some achieving average reductions of over 180 hours per review. However, challenges remain in transparency, reproducibility, and validation of AI performance. In response, international initiatives such as the Responsible AI in Evidence Synthesis (RAISE) project and the Guideline International Network (GIN) have proposed frameworks to ensure the ethical, trustworthy, and effective use of AI in health research. These include principles like transparency, accountability, preplanning, and continuous evaluation. This review highlights both the opportunities and limitations of adopting AI in evidence synthesis and underscores the importance of human oversight and rigorous validation to ensure that such tools enhance, rather than compromise, the integrity of systematic reviews and guideline development.
Systematic review is a major methodology for evidence-decision making in healthcare policy, health technology assessment (HTA) and evidence-based guideline development. Systematic reviews are labor-intensive and time-consuming, typically taking around 41 weeks (nearly a year) from protocol development to final journal submission [1]. The past several years have seen the development and increasing adoption of various machine learning (ML)-based semi-automation tools designed to overcome the challenges inherent in systematic reviews [2]. Despite their individual strengths and weaknesses, these tools have gradually gained traction within the research community. More recently, the widespread emergence of Large Language Models (LLMs) has prompted researchers to explore their potential for systematic review automation. While concerns regarding accuracy and the "black box" nature of LLMs currently necessitate human oversight, ongoing technological advancements hold significant promise for future applications [3,4] Compared to LLMs, semi-automation tools using conventional ML have gained more trust for preserving methodological rigor. These tools assist in managing workload and improving process efficiency while upholding the strict standards of systematic review [5]. Notably, recent trends indicate an integration of artificial intelligence (AI) functionalities especially LLMs into these tools to further enhance efficiency and broaden their utility. This review aims to explore these recent developments and their applicability within the context of systematic reviews.
While full automation of systematic reviews remains ideal goal, this review focuses on semi-automation. Semi-automation software and platforms were available from several years ago and rapidly expanding their utilities adapting AI-tech, to streamline and expedite various stages of the systematic review process [6]. Study selection has been a primary focus, with numerous tools providing semi-automated and fully automated solutions. Popular platforms like DistillerSR, Covidence, EPPI Reviewer, Abstrackr, and Rayyan have integrated AI-assisted screening. However, many of these tools lack publicly available source code, provide limited information on classifier training, and haven't published performance evaluations. Most screening tools use supervised machine learning, which requires users to manually screen a portion of articles to generate training and test data [7].
AI tools are widely used for various evidence synthesis tasks, ranging from standalone solutions to integrated systematic review platforms. While many tools offer automated solutions for tasks like study selection, they often lack transparency and public performance evaluations. There is a growing interest in using generative LLMs for these tasks due to their potential to reduce the need for extensive training data.
Here are several detailed popular semi-automation tools;
Covidence
Covidence is positioned as a tool to streamline and structure the traditional systematic review process, with a strong focus on the Cochrane methodology. Its user experience is characterized by a prescribed, step-by-step workflow that guides users through screening, conflict resolution, and data extraction, thereby enforcing methodological rigor [8]. The tool is designed for reviewers at all levels of experience. Since 2023, AI-driven literature screening has become feasible through tools like the RCT classifier, and more recently, large language models (LLMs) have begun to be integrated into data extraction tools—marking the initial use of LLMs in this critical phase of evidence synthesis.
EPPI-Reviewer
Developed by the EPPI-Centre at UCL, EPPI-Reviewer is a non-profit, web-based academic tool designed for maximum flexibility. It supports a vast range of review types beyond standard meta-analyses, including qualitative, mixed-methods, framework, and thematic syntheses. It is intended for reviewers who require the freedom to customize their methods and coding tools. Screening Prioritization (Active Learning) is a core feature. The tool uses text mining and active learning, where the algorithm iteratively learns from the reviewer's decisions to re-rank the remaining abstracts, aiming to find all included studies by screening a smaller portion of the total set. Uniquely supports line-by-line coding of textual data directly from PDFs, creation of conceptual relationship diagrams for qualitative synthesis, and integrated meta-analysis via 'R' libraries (Metafor) for advanced statistical analyses like meta-regression. The latest version integrates OpenAI's GPT-4o for automated coding, where the model can apply codes to titles and abstracts based on user-defined prompts [9].
Developed by Evidence Prime, a spin-off of McMaster University, Laser AI is built from the ground up to support living systematic reviews in high-stakes environments like pharmaceutical companies and health technology assessment (HTA) agencies. Its philosophy centers on efficiency, security, data reusability, and regulatory compliance (e.g., for FDA submissions). This system features a Living Review Architecture that can continuously update, handling up to 15,000 new references monthly, and offers AI-Assisted data extraction to significantly reduce manual effort by suggesting data from PDFs. It also provides robust data management and reusability through controlled vocabularies and clean-up modules, enabling data reuse across projects and export in various structured formats. Furthermore, its Auditability and Compliance features maintain a detailed project history crucial for transparency (10). The platform leverages AI and Automation Capabilities, including a proprietary natural language processing (NLP) Model for screening prioritization and AI-Assisted Summarization that auto-reports study limitations with traceable source quotations. Additionally, its Advanced Search and RAG (Retrieval-Augmented Generation) capabilities allow natural language queries across extensive databases, showcasing a sophisticated approach to information retrieval [10,11].
DistillerSR is a web-based, semi-automated tool designed to support the systematic review process, particularly in the title/abstract screening and data extraction phases. It leverages machine learning capabilities, including prioritization features, to enhance the efficiency of literature reviews [6,12]. DistillerSR demonstrates potential for improving workflows if AI features are further simplified for literature screening and integrated into data extraction processes. However, significant time is required to create a training set to utilize AI functionality effectively, and its customized UI involves complex procedures that necessitate considerable familiarity with the system. As a result, the system received low scores in terms of ease of use and overall usability. Therefore, at this point, its feasibility for adoption requires further reconsideration [13].
In 2020, National Evidence-based Healthcare Collaborating Agency has reviewed five semi-automated tools for systematic review [14] (Table 1). Among the online semi-automated screening programs available in the market, Covidence and EPPI-Reviewer were selected along with three free screening programs—Rayyan, Abstrackr, and Robot Analyst—which were the most frequently used in previous research. Despite its limited functions, the screening performance of Robot reviewer was also analyzed considering its accessibility, practicality, and artificial intelligence (AI)-enabled services. This study, analyzed 77 HTA reports, revealed the typical workload for Systematic Reviews (SRs). The median SR took 10.6 weeks, though "fast-track" assessments were much quicker at about 4 weeks. A major time sink was literature selection, consuming over 40% of the total SR time in more than half the cases. The research suggests semi-automated tools could significantly cut down on literature selection time, especially for "fast-track" and "health technology reassessment" categories, boosting efficiency.
Previous studies compared performance of semi-automated tools show significant potential for workload reduction. A scoping review reported an average time saving of 185 hours when compared to tools like Abstrackr and RobotAnalyst [6]. One comparative study found EPPI-Reviewer could reduce screening burden by 9% to 60%, outperforming Abstrackr in some scenarios. However, performance is highly variable and depends on the review's topic; for a heterogeneous review, its performance was markedly poorer. This highlights a key challenge for credibility and preplanning. Simulation studies suggest active learning can reduce screening effort by 40-50% or more while maintaining high recall. In some contexts, such as for in vitro studies where abstracts may be poor indicators of relevance, text mining on titles and abstracts has been shown to outperform human screening [15]. Recent work has addressed the limited adoption of machine learning in automating data extraction for environmental health literature. Dextr, a web-based semi-automated tool, was developed to support hierarchical data extraction through user-verified predictions and token-level annotations. In testing with 51 animal studies, Dextr maintained similar precision (96.0%) and slightly reduced recall (91.8%) compared to manual extraction, while halving extraction time [11]. A systematic review evaluated the performance and workload reduction of AI-based tools for literature screening in cancer-related systematic reviews. Five studies assessed four tools—Abstrackr, RobotAnalyst, EPPI-Reviewer, and DistillerSR—demonstrating varying efficiencies. Abstrackr showed the highest time savings, eliminating up to 88% of abstracts and 59% of full-texts without missing included citations. Other tools showed more modest reductions [16].
Overall, these findings underscore the growing utility of semi-automated tools in improving the efficiency of systematic reviews, while also highlighting the need for careful consideration of tool selection based on review characteristics and domains.
The Responsible AI in Evidence Synthesis (RAISE) project is an initiative designed to address the challenges associated with the use of Artificial Intelligence (AI) tools in evidence synthesis [17]. The project aims to provide guidance to the evidence synthesis community on how and when to effectively and responsibly utilize AI, given the rapid influx of AI tools promising to streamline the process. It highlights that the mere availability of AI does not justify its use, and improper application can hinder the evidence synthesis process, potentially introducing or exacerbating harms. The RAISE project's guidance is structured into three main documents:
• RAISE 1: This document offers tailored recommendations for various roles within the evidence synthesis ecosystem, including evidence synthesists, methodologists, AI tool development teams, organizations producing evidence synthesis, publishers, funders, users, and trainers of evidence synthesis methods.
• RAISE 2: This part provides guidance on the development and evaluation of AI evidence synthesis tools. It focuses on how to determine if an AI tool performs as claimed to an acceptable standard, including methods for building and validating these tools, conducting evaluations, considering performance metrics, and reporting findings.
• RAISE 3: This specific document (the source of this information) focuses on guiding users in selecting and utilizing AI evidence synthesis tools. It offers an overview of the current state of AI in evidence synthesis and provides advice on assessing tools for both external and internal validity, along with key ethical, legal, and regulatory considerations.
Guideline International Network (GIN) also published the consensus for the responsible and transparent use of Artificial Intelligence (AI) in health guideline development. Recognizing the rapid evolution and potential of AI, as well as the lack of specific guidance in this domain, GIN aims to support guideline developers in leveraging AI tools effectively while ensuring trustworthiness and adherence to ethical standards [18].
Framework outlines eight key principles for integrating AI into health guideline development, prioritizing ethical and effective implementation. First, transparency is crucial; all AI tools, data, and methods must be clearly documented and understandable, detailing human involvement and any deviations. Preplanning requires anticipating AI's advantages, risks, and limitations, considering methodological choices, budget, and equity. AI use should offer clear additionality, providing gains beyond non-AI tools through new capabilities or increased efficiency. Credibility demands that AI tools demonstrate sufficient quality for their intended application, with performance assessments guiding selection. Furthermore, ethics are paramount, requiring adherence to human rights, equity, and data privacy, addressing potential biases. Accountability necessitates human oversight to direct AI use and ensure compliance with legal frameworks, with clear mechanisms for examining AI-generated content quality. Compliance ensures all AI tools and processes meet relevant legal and regulatory standards. Finally, continuous evaluation of AI's use and effects is vital given its rapid evolution. These principles offer a flexible yet foundational framework, emphasizing transparency and ongoing assessment to foster trustworthy guidelines.
These two statements highlight a shared, critical need for responsible and transparent AI integration within evidence synthesis and health guideline development. Both the RAISE project and the Guideline International Network (GIN) recognize AI's transformative potential while emphasizing that its mere availability doesn't justify its use. Ultimately, both initiatives converge on the idea that effective AI implementation in these fields hinges on clear documentation, rigorous ethical considerations, human oversight, and ongoing assessment to ensure trustworthiness and prevent harm.
AI-assisted semi-automation is not a futuristic concept but a present-day reality that is already transforming how we conduct systematic reviews and develop guidelines. These tools are not autonomous "robot reviewers" but sophisticated assistants that empower researchers to synthesize evidence with greater speed and scale than ever before. The future lies in a seamless human-AI collaborative ecosystem. To fully realize its potential, researchers must prioritize transparency, ethical use, and ongoing evaluation, ensuring that these tools serve as reliable partners in producing timely, high-quality systematic reviews and clinical guidelines.

Conflict of Interest

No potential conflict of interest relevant to this article was reported.

Funding

This work was not directly funded. The authors' research, which forms the basis for some of the content presented herein, was supported by the NECA (Project No. NECA-P-20-001 and NECA-A-24-008).

Data Availability Statement

All data generated or analyzed during this study are included in this published article and its supplementary information files.

Ethics Approval and Consent to Participate

Not applicable.

Authors Contributions

All the work was done by Miyoung Choi.

Acknowledgments

Dong-Ah Park, Hyeon-Jeong Lee, Jimin Kim, Jungeun Park, Hyo-Weon Suh, Seungeun Ryu, Haine Lee, and Jinyoung Chang at NECA were contributed as participants of previous researches of NECA on evaluation of applicability of semi-automation tools.

Table 1.
Semi-Automation Tools Comparative Functions (Evaluated in 2020)
Steps Literature search results import Literature screening Risk of bias assessment Data extraction and synthesis AI integration (as of 2025)
Programs Title/abstract screening Fulltext screening
Covidence Import data, Manage duplicatesb) Priority screening, Highlightsa) Bulk upload of full texta) Risk of bias 1.0, customizeda) Data Extraction formb) RCT Classifier, LLM Data extraction
EPPI-Reviewer Search (PubMed), Import data, Manage duplicatesa) Priority screening, Allocation, Highlighta) Upload of full texta) Various, customizeda) Data Extraction form, Meta-analysisa) AI screening, LLM (GPT 4o)
Rayyan Import data, Search (PubMed)b) ML-assisted prioritization, Highlighta) Upload of full text AI screening
Abstrackr Import datab) Active learning, Highlighta) ML-AI screening
RobotAnalyst Import datab) Text-mining function, RobotAnalysta) LLM introducing

a)All needed functions are provided.

b)Not all needed functions are provided.

LLM: large language model, ML: machine-learning.

  • 1. Borah R, Brown AW, Capers PL, Kaiser KA. Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry. BMJ Open 2017; 7: e012545. Epub 2017/03/01. doi: 10.1136/bmjopen-2016-012545.
  • 2. Santos AOD, da Silva ES, Couto LM, Reis GVL, Belo VS. The use of artificial intelligence for automating or semi-automating biomedical literature analyses: A scoping review. J Biomed Inform 2023; 142: 104389.Epub 2023/05/16. doi: 10.1016/j.jbi.2023.104389.
  • 3. Li Y, Datta S, Rastegar-Mojarad M, Lee K, Paek H, Glasgow J, et al. Enhancing systematic literature reviews with generative artificial intelligence: development, applications, and performance evaluation. J Am Med Inform Assoc 2025; 32: 616-25.
  • 4. Siemens W, von Elm E, Binder H, Bohringer D, Eisele-Metzger A, Gartlehner G, et al. Opportunities, challenges and risks of using artificial intelligence for evidence synthesis. BMJ Evid Based Med. 2025. Epub 2025/01/10. doi: 10.1136/bmjebm-2024-113320
  • 5. Uthman OA, Court R, Enderby J, Al-Khudairy L, Nduka C, Mistry H, et al. Increasing comprehensiveness and reducing workload in a systematic review of complex interventions using automated machine learning. Health Technol Assess. 2022. Epub 2022/12/24. doi: 10.3310/UDIR6682
  • 6. Khalil H, Ameen D, Zarnegar A. Tools to support the automation of systematic reviews: a scoping review. J Clin Epidemiol 2022; 144: 22-42.
  • 7. Thomas J FE, Noel-Storr, A. et al. Responsible AI in Evidence Synthesis (RAISE): guidance and recommendations : RAISE 3 2025 [updated 3 June 2025; cited 2025 1 Aug].
  • 8. Covidence. Covidence systematic review software [Internet]. 2025 [cited 2025 1 Aug]. Available from: https://www.covidence.org/
  • 9. EPPI-reviewer. EPPI-Reviewer: software for systematic reviews [Internet] 2025 [cited 2025 1 Aug]. Available from: https://eppi.ioe.ac.uk/cms/Default.aspx?tabid=2914
  • 10. Evidence Prime. Laser AI: AI-powered platform for living systematic reviews [Internet]. 2025 [cited 2025 1 Aug]. Available from: https://www.laser.ai/
  • 11. Walker VR, Schmitt CP, Wolfe MS, Nowak AJ, Kulesza K, Williams AR, et al. Evaluation of a semi-automated data extraction tool for public health literature-based reviews: Dextr. Environ Int 2022; 159: 107025.
  • 12. Evidence Partners. DistillerSR systematic review software [Internet] 2025. Available from: https://www.distillersr.com/
  • 13. Choi M, Park DA, Lee HJ, Kim J, Park J, Suh H, et al. A Planning Study for Enhancing the Development and Utilization of Clinical Practice Guidelines. National Evidence-based Healthcare Collabrating Agency, 2024.
  • 14. Choi M, Park DA, Park J, Ryu S, Kim S, Seo H, et al. Efficiency in systematic review methodology and collaborating network for evidence-supported policy making. National Evidence-based Healthcare Collabrating Agency, 2020.
  • 15. Wilson E, Cruz F, Maclean D, Ghanawi J, McCann SK, Brennan PM, et al. Screening for in vitro systematic reviews: a comparison of screening methods and training of a machine learning classifier. Clin Sci (Lond) 2023; 137: 181-93.
  • 16. Yao X, Kumar MV, Su E, Flores Miranda A, Saha A, Sussman J. Evaluating the efficacy of artificial intelligence tools for the automation of systematic reviews in cancer research: A systematic review. Cancer Epidemiol 2024; 88: 102511.
  • 17. Open Science Framework. Responsible AI in Evidence Synthesis (RAISE): guidance and recommendations [Internet]. 2025 [cited 2025 1 Aug]. Available from: https://osf.io/fwaud/
  • 18. Sousa-Pinto B, Marques-Cruz M, Neumann I, Chi Y, Nowak AJ, Reinap M, et al. Guidelines International Network: Principles for Use of Artificial Intelligence in the Health Guideline Enterprise. Ann Intern Med 2025; 178: 408-15.

Figure & Data

References

    Citations

    Citations to this article as recorded by  

      Download Citation

      Download a citation file in RIS format that can be imported by all major citation management software, including EndNote, ProCite, RefWorks, and Reference Manager.

      Format:

      Include:

      Artificial intelligence assisted semi-automation tools using for systematic reviews and guideline development
      J Evid-Based Pract. 2025;1(2):62-67.   Published online September 29, 2025
      Download Citation
      Download a citation file in RIS format that can be imported by all major citation management software, including EndNote, ProCite, RefWorks, and Reference Manager.

      Format:
      • RIS — For EndNote, ProCite, RefWorks, and most other reference management software
      • BibTeX — For JabRef, BibDesk, and other BibTeX-specific software
      Include:
      • Citation for the content below
      Artificial intelligence assisted semi-automation tools using for systematic reviews and guideline development
      J Evid-Based Pract. 2025;1(2):62-67.   Published online September 29, 2025
      Close
      Artificial intelligence assisted semi-automation tools using for systematic reviews and guideline development
      Artificial intelligence assisted semi-automation tools using for systematic reviews and guideline development
      Steps Literature search results import Literature screening Risk of bias assessment Data extraction and synthesis AI integration (as of 2025)
      Programs Title/abstract screening Fulltext screening
      Covidence Import data, Manage duplicatesb) Priority screening, Highlightsa) Bulk upload of full texta) Risk of bias 1.0, customizeda) Data Extraction formb) RCT Classifier, LLM Data extraction
      EPPI-Reviewer Search (PubMed), Import data, Manage duplicatesa) Priority screening, Allocation, Highlighta) Upload of full texta) Various, customizeda) Data Extraction form, Meta-analysisa) AI screening, LLM (GPT 4o)
      Rayyan Import data, Search (PubMed)b) ML-assisted prioritization, Highlighta) Upload of full text AI screening
      Abstrackr Import datab) Active learning, Highlighta) ML-AI screening
      RobotAnalyst Import datab) Text-mining function, RobotAnalysta) LLM introducing
      Table 1. Semi-Automation Tools Comparative Functions (Evaluated in 2020)

      All needed functions are provided.

      Not all needed functions are provided.

      LLM: large language model, ML: machine-learning.

      TOP