LATROBE, PA — Saint Vincent College recently received a $150,000 National Science Foundation (NSF) grant with Dr. Mary Regina Boland, C’10, assistant professor of data science, as principal investigator. This funding is part of a $1 million grant with three other institutions: University of Pennsylvania, University of Iowa and University of Virginia.
The project is titled “Personal Determinants of Health-Enhanced Machine Learning Models for Early Prediction of Alzheimer's Disease and Related Dementias.” Fruits of this project will advance national health and promote science and technology development by providing algorithms, software and systems that can train machine learning models on electronic health records for accurate and early prediction of Alzheimer’s Disease and Related Dementias (ADRD).
ADRD are debilitating conditions that affect over 5 million people over the age of 65, characterized by progressive memory loss, cognitive impairment and personality changes, which can further evolve to dementia and death.
The population of the United States is aging, as evidenced by the “Vintage 2024 Population Estimates” released in June by the US Census Bureau. The findings reported that older adults outnumber children in 11 states and nearly half of US counties.
Simultaneously, the number of US citizens suffering from ADRD is growing. Boland warns that unless research provides solutions for early identification and treatment of ADRD, the burden on the nation’s healthcare system will be unparalleled.
“The purpose of this project is to learn patterns from those who have or had ADRD in the past to understand those who may be developing ADRD,” she said. “This could enable us to identify ADRD early and also identify risk factors that could be treated at an earlier stage. With earlier treatment, we are hoping the effects of this devastating disease will be ameliorated.”
Recent studies have shown that personal risk factors such as education, employment and lifestyle or family history significantly influence ADRD onset and progression.
These factors, however, are not recorded in a structured format within existing electronic health records. Personal risk factors are often embedded within the free text of clinical notes or discharge summaries that are not easily searchable, computable or standardized. This creates a major technical barrier for their integration into ADRD prediction models.
The research project aims to develop a computational platform using novel machine learning and natural language processing methods to automatically extract personal risk factors from clinical narratives within electronic health records and leverage them for accurate and early prediction of ADRD. Boland hopes the research identifies existing ADRD risk factors and uncovers new risk factors.
“By learning new risk factors and understanding the patterns of these risk factors, we hope to both identify ADRD earlier in the disease and also identify potential avenues for treatment and other therapies,” she said.
Certain individuals develop ADRD at a younger age, Boland added, and/or also progress more rapidly than others once they are diagnosed. The reasons are unclear but can be due to different factors, including exposure to various toxins. Patients with a greater lifetime exposure to a toxin will suffer greater effects from that toxin, therefore any toxins implicated in ADRD will more affect those with a longer exposure.
“Because toxin exposures vary by where we live and work, this effect can result in differential exposures among individuals,” Boland said. “This results in diversity among the ADRD phenotype observed among various populations.”
The work will continue for several years with an estimated end date of Sept. 30, 2029.
Boland has been studying population-level differences in disease risk and exposure risk for over a decade and has investigated specific environmental exposures and less specific ones, such as seasonal variability. She has long been fascinated by hidden differences among individuals that can alter disease risk—sometimes termed latent disease—including hidden genetic or environmental differences.
Boland’s interest in the environmental causes of diseases stems from growing up near an area that was deemed a superfund site by the US Environmental Protection Agency (EPA). Thousands of contaminated sites—or superfund sites—exist in the country due to hazardous waste being dumped, left out in the open or otherwise improperly managed. These sites include manufacturing facilities, processing plants, landfills and mining sites, according to the EPA.
“I witnessed several individuals who I grew up with die at a very early age,” Boland said, “and in many cases, it was difficult to identify the cause until much later.”
Looking ahead, Boland hopes to engage students with this research project to help them learn skills with real-world datasets to health analytics and ADRD.
“We need to train the future generation of scientists, researchers and analysts to be able to study and conduct ethical research,” Boland said. “I have had the pleasure of interacting with several students who have helped with literature review, and I am hoping to include them in modeling in the future.”
This is Boland’s second NSF grant as principal investigator while at Saint Vincent College. The first, also in association with three other institutions of higher education, and for the award period 2024-2027, is titled “Strengthening Collaborative Advancements Leveraging Equitable University Partnerships.” Fellow principal investigators for this grant are Fr. Michael Antonacci, O.S.B., C’07, S’14, associate professor of physics, and Dr. Stephen Jodis, dean of the Herbert W. Boyer School of Natural Science, Mathematics and Computing.
Boland also published a textbook with Springer Nature in January titled “Health Analytics with R: Learning Data Science Using Examples from Healthcare and Direct-to-Consumer Genetics.” Since its release, the textbook has sold over 5,000 copies both online and in print.