''Our models will cover the entire patient's journey: from a patient at home, up to a visit to a healthcare specialist in a memory clinic.''
Holger Fröhlich - Head of AI & Data Science at Fraunhofer SCAI, explains his work on PREDICTOM and sheds light on the data models used in the project as well as the ambitions for wider use of PREDICTOM's data platform.
Holger Fröhlich
What is your role in PREDICTOM?
I am a member of the Management Team and a co-lead of Work Package 2, which is responsible for data governance and the development of AI/ML models. Regarding data governance, which was the focus of Work Package 2 within the first year of the project, we developed an Observational Medical Outcomes Partnership (OMOP) derived common data model to ensure that all data collected in the PREDICTOM study is collected in a structured and standardized manner, hence enabling a high level of data quality and interoperability with existing datasets. Furthermore, we implemented and released an openly accessible and searchable meta-data catalogue. Altogether, this ensures that PREDICTOM fulfills the criteria of data being findable, accessible, interoperable and reusable (FAIR).
Starting from the second year of the project, the focus on Work Package 2 will be on the development and validation of Artificial Intelligence/Machine Learning (AI/ML) models for dementia risk assessment and prognosis. These models will initially be trained based on different types of historical data (real-world data, including digital biomarkers, and data from various clinical studies) covering different stages of a patient’s journey (at home, visiting a general practitioner, visiting a healthcare specialist in a memory clinic).
Models will be validated using cross-validation techniques and investigated with recent eXplainable Artificial Intelligence (XAI) methods (e.g. SHAP). Several of our models will support the risk stratification of patients in Work package 4, hence enabling prospective validation via the PREDICTOM clinical study. Our final models will be deployed to the PREDICTOM platform, which is developed in Work Package 3.
Why is Frauenhofer SCAI as a partner in PREDICTOM best suited to fulfil this role?
We have extensive experience with AI/ML and data governance in biomedicine. For instance, we are heavily involved in working with data from clinical studies and real-world (including data coming from digital device technologies). Also, we develop tailor-made models for a wide range of applications in translational biomedical research. With this expertise we are able to contribute to various workstreams within PREDICTOM.
What is your personal motivation to participate in the project?
I co-initiated this project, because I believe that we can and need to use AI/ML to identify people with an elevated risk of developing Alzheimer's disease (AD) in the future, hence supporting an earlier diagnosis and – at least in the USA - therapeutic intervention. More concretely, high-risk individuals should get preferential access to modern, biomarker-based diagnosis, which is only possible in specialized neurological centers. Altogether, PREDICTOM thus implements a stratification of individuals by disease risk.
Which data & models are used in PREDICTOM and how?
We obtain data through a wide range of instruments, from digital devices and online self-assessments to brain activity tests (EEG), bodily fluids and Magnetic Resonance Imaging (MRI). Next to that, we also use existing datasets to develop our models (e.g. UK Biobank, HUNT, PROTECT, ADNI). The data and models will thus cover the entire patient's journey, starting from a patient at home, over the visit to the GP up to a patient’s visit to a healthcare specialist in a memory clinic.
What are PREDICTOM’s plans on data harmonization and data management and how are those realized?
The data will be mapped according to the Observational Medical Outcomes Partnership (OMOP) common data model, The OMOP common data model includes a standardization of medical terms to which all data must comply with. The developed models within PREDICTOM and the produced data will consequently be formatted according to the OMOP model. Moreover, prospectively collected data will be stored in an SQL database which is compliant with our data model.
What are your expectations from the data sharing used in PREDICTOM?
Sharing data is the prerequisite for making use of data and generating value from it: We need data to train the AI/ML models. Only when we have those models, can the PREDICTOM platform generate value.
How does the use of this data platform benefit research and patients specifically?
Researchers who want to make use of the data platform developed in PREDICTOM will – after some embargo period – have the possibility to use the collected high-quality data for their own research . In the long-run, PREDICTOM's platform is expected to allow for earlier diagnosis of AD in risk populations. It should be noted that this is, however, only possible after regulatory approval of the platform .