Doctoral Thesis Defense of Emese Sükei

  • Title: “A Step Towards Advancing Digital Phenotyping In Mental Healthcare”
  • Advisor: Antonio Artés Rodríguez.


Smartphones and wrist-wearable devices have penetrated our lives in recent years. According to published statistics, nearly 84% of the world’s population owns a smartphone, and almost 10% own a wearable device today (2022). These devices continuously generate various data sources from multiple sensors and apps, creating our digital phenotypes. The increasing data availability, computing power and cheaper data storage have encouraged a continuous surge in research for new algorithms and applications in various fields, and the field of mental health is no exception.

Unobtrusive monitoring using patients’ devices allows for the continual collection of a person’s activity, such as step count and exercise patterns, health signals, such as sleep and heart rate, and social behaviours (how many calls and text messages are sent). Even though it is more challenging to analyse, the data variability offers a unique opportunity to describe the person’s lifestyle and behaviour in situ. However, these data sources must be translated into meaningful, actionable features related to mental health to achieve their full potential. Once processed, these clinically valuable markers can improve diagnostic processes, tailor treatment choices, provide continuous insights into their condition for actionable outcomes, such as early signs of relapse, and develop new intervention models.

In the mental health field, there is a great need and much to be gained from defining a way to continuously assess the evolution of patients’ mental states, ideally in their everyday environment, to support the monitoring and treatments by health care providers. A smartphone-based approach may be valuable in gathering long-term objective data, aside from the usually used self-ratings, to predict clinical state changes and investigate causal inferences about state changes in patients (e.g., those with affective disorders).

Being objective does not imply that passive data collection is also perfect. It has several challenges: some sensors generate vast volumes of data, and others cause significant battery drain. Furthermore, the analysis of raw passive data is complicated, and collecting certain types of data may interfere with the phenotype of interest. Nonetheless, machine learning is predisposed to address these matters and advance psychiatry’s era of personalised medicine.

This work aimed to extend the research efforts on mobile and wearable sensors for mental health monitoring and address the issues mentioned above. We applied supervised and unsupervised machine learning methods to model and understand mental disease evolution based on the digital phenotype of patients and clinician assessments at the follow-up visits, which provide ground truths. The developed methods must cope with regularly and irregularly sampled high-dimensional and heterogeneous time series data, often susceptible to distortion and missingness. Therefore, robust solutions had to be found for these limitations and properly handling missing data.

We tried to answer relevant questions, such as how to process appropriately and impute these mobile sensed data streams to predict mental health outcomes and what data types are necessary and relevant for the different predictive tasks. Throughout the various projects presented here, we used probabilistic latent variable models for data imputation and feature extraction, namely, mixture models (MM) and hidden Markov models (HMM).

Depending on the properties of the data at hand, we employed feature extraction methods combined with classical machine learning algorithms or deep learning-based techniques for temporal modelling to predict various mental health outcomes – emotional state, World Health Organisation Disability Assessment Schedule (WHODAS 2.0) functionality scores and generalised anxiety disorder-7 (GAD-7) scores-, of psychiatric outpatients. We mainly focused on one-size fits-all models, as the labelled sample size per patient was limited; however, in the mood prediction case, it was possible to apply personalised models.

Our findings show the feasibility of using machine learning-based methods to predict the listed mental health outcomes, wither solely from passively sensed mobile data or a combination of socio-demographic information. We present a sound basis for further exploration by proposing a solution to missing and sparsely labelled data, allowing the future focus to be directed toward developing more advanced models. The solutions, however, require additional clinical validation to be deployable in the clinical workflow. The results are promising and lay the foundations for future research and collaboration.

Our analysis was based on observing a relatively small number of patients and should be interpreted with the following limitations. First, the data cannot explain the causal links between digital phenotypes and mental health outcomes. Second, in many cases, individuals have only a single score, which does not allow for the training of personalised models. Moreover, we did not investigate the possibilities of specialised models for different patient groups or individual patients.

Future studies should build upon the current thesis to continue exploring the potential for digital personalised medicine by integrating digital phenotyping and digital interventions in the mental health field. Predicting the specific questionnaire outcomes could allow for early intervention and relapse prevention, possibly the two most promising early warning services that could be offered to individuals who would otherwise find it hard to sustain self-monitoring.