NLM logo
National Information Center on Health Services Research and Health Care Technology (NICHSR)

HSRProj (Health Services Research Projects in Progress)

Information about ongoing health services research and public health projects


Statistical methods for phenotype estimation and analysis using electronic health records
Investigator (PI): Hubbard, Rebecca
Performing Organization (PO): (Current): University of Pennsylvania, Perelman School of Medicine, Department of Biostatistics, Epidemiology and Informatics / (215) 898-0901
Supporting Agency (SA): Patient-Centered Outcomes Research Institute (PCORI)
Initial Year: 2016
Final Year: 2020
Record Source/Award ID: PCORI/ME-1511-32666
Funding: Total Award Amount: $1,059,241
Award Type: Contract
Award Information: PCORI: More information and project results (when completed)
Abstract: Background and significance: Electronic health records (EHR) provide extensive information on disease risk factors that can be studied to improve our understanding of health outcomes. However, medical assessments are performed at irregular intervals in response to patients' medical needs, which makes these data difficult to use for research. This project will develop new statistical methods that combine the unique set of measures available for each individual to estimate a latent phenotype. The latent phenotype consists of a patient's underlying, true disease profile, which may be only hinted at by the series of medical tests recorded in the EHR. By efficiently combining all available information for each individual, we will leverage the richness and complexity of EHR data, and we will be able to better characterize patients. To demonstrate the potential of our new statistical methods, we will use them to identify children and adolescents with type II diabetes. Using EHR data from eight children's hospital health systems participating in the PEDSnet federation, we will develop a pediatric diabetes latent phenotype. This phenotype can be used in subsequent research for identifying patient participants or for assessing risk of other health outcomes that may be increased in children with type II diabetes. We will work with clinician, patient, and parent partners from PEDSnet to identify downstream health consequences that are most important for further study and analyze associations between the newly developed diabetes latent phenotype and these outcomes. These analyses will illustrate the performance of the latent phenotype approach in a real-world context where information on risk factors and outcomes for type II diabetes is urgently needed. Study aims: The study aims 1) to develop statistical methods for estimating latent phenotypes, 2) to develop methods for incorporating latent phenotypes into analyses of health outcomes accounting for uncertainty in phenotypes and other patient covariates, and 3) to estimate a type II diabetes phenotype for patients in the PEDSnet federation and associations with downstream health outcomes. The long-term objective of this research is to provide better statistical methods for combining inconsistently collected measures derived from the EHR. Study description: We will develop statistical methods and software for estimating latent phenotypes and their associations with health outcomes. We will evaluate the predictive accuracy, bias, and efficiency of these methods relative to standard approaches, through statistical simulations. Using data from PEDSnet, we will estimate a latent type II diabetes phenotype. To assess the added value of using our new methods, we will compare the performance of our new methods with performance of previously developed phenotypes.
MeSH Terms:
  • Adolescent
  • Child
  • Computer Simulation
  • Data Collection
  • Data Mining
  • Diabetes Mellitus, Type 2 /*diagnosis
  • /*physiopathology
  • * Electronic Health Records
  • Humans
  • Models, Statistical
  • Outcome Assessment (Health Care)
  • Phenotype
  • Reproducibility of Results
  • Risk Factors
  • Statistics as Topic
Country: United States
State: Pennsylvania
Zip Code: 19104
UI: 20164097
Project Status: Ongoing
Record History: ('2017: Project extended to 2020.',)