NLM logo

National Information Center on Health Services Research and Health Care Technology (NICHSR)

HSRProj (Health Services Research Projects in Progress)

Information about ongoing health services research and public health projects


Statistical methods and designs for addressing correlated errors in outcomes and covariates in studies using electronic health records data
Investigator (PI): Shepherd, Bryan E; Shaw, Pamela
Performing Organization (PO): (Current): Vanderbilt University Medical Center, Department of Biostatistics / (615) 322-2001
Supporting Agency (SA): Patient-Centered Outcomes Research Institute (PCORI)
Initial Year: 2017
Final Year: 2021
Record Source/Award ID: PCORI/ME-1609-36207
Funding: Total Award Amount: $1,050,000
Award Type: Contract
Award Information: PCORI: More information and project results (when completed)
Abstract: There is growing interest in using electronic health record (EHR) data as a practical resource to support patient centered research using patient outcomes from real world clinical settings. An enormous number of articles reporting health outcomes in the EHR are appearing in the medical literature, alongside others that raise concerns of data quality and misleading findings. For example, the EHR may have incorrect information for medical diagnoses, dates of diagnoses, or treatments. Errors of this nature in the EHR can be corrected by carefully reviewing patient records and making changes, where needed, but it is expensive and time consuming to do this for large numbers of patient records. Instead, data validation can be performed on a subset of records and this information can be used to statistically correct estimates based on the larger dataset, most of which has not been validated. The primary aims of this project are 1) to develop statistical methods that allow researchers to obtain accurate estimates using data that have only been partially validated, 2) to better understand which patient records should be validated to optimize resources, and 3) to apply our methods to a real world study using EHR data. Existing statistical methods are only able to handle simple errors, whereas errors in EHR data tend to be more complicated and across several different variables (e.g., date of treatment initiation may be incorrectly recorded, so blood pressure at treatment initiation may also be incorrect). We will extend existing statistical methods to handle errors commonly seen in EHR data. We will address questions such as: What records are most informative for correcting our analysis? If we validate an initial subset of patient records, how do we best use this information to select a second subset of records to validate? Finally, we will apply what we learn to an ongoing research study from the Mid-South Clinical Data Research Network, which includes EHR data from millions of patients in the southeast United States. In this study, we will identify factors that affect risk of early childhood obesity, such as a mother's weight over time, and adjust our analysis for error patterns that can affect these risk factors. We will publish our results in scientific papers and develop publicly available software that implements our methods. Accurate study results are important for medical researchers, clinical providers, and patients, so that medical practice can be based on reliable, trustworthy information. However, research funds are limited, so complete validation of the EHR prior to using it for medical research is impractical. Our proposed statistical methods will result in more trustworthy results while saving researchers and their funders money. We will meet regularly with advisory committees of stakeholders (PCORnet leaders, funders, investigators, and patients) to ensure that our methods are grounded in reality and of value to patient care.
MeSH Terms:
  • Access to Information
  • Blood Pressure
  • * Electronic Health Records
  • Female
  • Humans
  • Medical Errors
  • Models, Statistical
  • Outcome Assessment, Health Care
  • Patient-Centered Care
  • Pediatric Obesity /*prevention & control
  • Reproducibility of Results
  • Risk
  • Risk Factors
  • Software
  • Southeastern United States
  • Statistics as Topic
Country: United States
State: Tennessee
Zip Code: 37203
UI: 20181639
Project Status: Ongoing
Record History: ('2018: Project extended to 2021.',) ('Project start date corrected to 2017 per PCORI, 2/4/2020',)