Connect with us

Alternative Health

Chia, a large annotated corpus of clinical trial eligibility criteria

Published

on

Fabrício KuryAlex ButlerChi YuanLi-heng FuYingcheng SunHao LiuIda SimSimona Carini & Chunhua Weng

Abstract

We present Chia, a novel, large annotated corpus of patient eligibility criteria extracted from 1,000 interventional, Phase IV clinical trials registered in ClinicalTrials.gov. This dataset includes 12,409 annotated eligibility criteria, represented by 41,487 distinctive entities of 15 entity types and 25,017 relationships of 12 relationship types. Each criterion is represented as a directed acyclic graph, which can be easily transformed into Boolean logic to form a database query. Chia can serve as a shared benchmark to develop and test future machine learning, rule-based, or hybrid methods for information extraction from free-text clinical trial eligibility criteria.

Measurement(s)Clinical Trial Eligibility Criteria • Analytical Procedure Accuracy
Technology Type(s)digital curation • computational modeling technique
Sample Characteristic – OrganismHomo sapiens

Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.12765602

Background & Summary

Clinical trial eligibility criteria specify rules for screening clinical trial participants and play a central role in clinical research in that they are interpreted, implemented, and adapted by multiple stakeholders at various phases in the clinical research life cycle1. After being defined by investigators, eligibility criteria are used and interpreted by clinical research coordinators for screening and recruitment. Then, they are used by query analysts and research volunteers for patient screening. Later, they are summarized in meta-analyses for developing clinical practice guidelines and, eventually, interpreted by physicians to screen patients for evidence-based care. Hence, eligibility criteria affect recruitment, results dissemination, and evidence synthesis.

Despite their importance, recent studies highlight the often negative impact these criteria have on the generalizability of a given trial’s findings in the real world2,3. When eligibility criteria lack population representativeness, the enrolled participants cannot unbiasedly represent those who will be treated according to the results from that study4. Given that eligibility criteria are written in free text, it is laborious to answer this representativeness question at scale5. A related challenge is to assess the comparability of trial populations, especially for multi-site studies: e.g.,, given two clinical trials investigating the same scientific question, can we tell if they are studying comparable cohorts? The manual labor required from domain experts for such appraisal is prohibitive. Another challenge is patient recruitment, or finding eligible patients for a clinical trial, which remains the leading cause of early trial termination6,7. Unsuccessful recruitment wastes financial investment and research opportunities, on top of missed opportunities, inconvenience, or frustration of patients when the clinical trial is terminated early or cancelled.

Computable representations of eligibility criteria promise to overcome the above challenges and to improve study feasibility and recruitment success8. The Biomedical Informatics research community has produced various knowledge representations for clinical trial eligibility criteria9, though nearly all of them predate the current state-of-the-art in machine learning, and some even predate contemporary electronic health records9. Early efforts to create annotated datasets in eligibility criteria have used a variety of methods including ad-hoc annotation10, manual annotation of standardized biomedical concepts11, as well as leveraging biomedical knowledge resources such as UMLS for automatic semantic pattern extraction12. The annotations in these datasets do not capture sufficient information to form the logical statements of a database query, and few annotated datasets are publicly available. Ross et al. published a dataset with 1,000 eligibility criteria and analyzed their semantic complexity, but the data were not amenable for machine learning13. 79 eligibility criteria were annotated by Weng et al. with semantic tags and relations, but these are too few to serve as a sufficiently large training resource12. The most robustly annotated and the only publicly available corpus to date was produced by Kang et al.14, who annotated eligibility criteria from 230 clinical trials, though all on Alzheimer’s Disease. Hence the corpus lacks generalizability to other diseases. These and other works have focused on bridging the gap between eligibility criteria and logical queries (Table 1), but the percentage of criteria that could be fully represented using these annotation models and used in database queries (here referred to as criteria coverage) is variable, ranging from 18% to 87%5,14,15,16,17,18.

References:

https://www.nature.com/articles/s41597-020-00620-0

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Alternative Health

YOUR OWN PERFECT MEDICINE by Martha M. Christy

Published

on

Continue Reading

Alternative Health

The DMSO Handbook for Doctors

Published

on

The DMSO Handbook for Doctors

Continue Reading

Alternative Health

Water Crystallography – Sailboat

Published

on

By: Veda Austin @vedaaustin_water

Continue Reading
Advertisement

Trending