Doctoral Thesis Defense of Lorena Romero Medrano

  • Title: Change-Point Detection Methods for Behavioral Shift Recognition in Mental Healthcare
  • Author: Lorena Romero (GTS)


Human behavior analysis has been approached from different perspectives along time. In recent years, the emergence of new technologies and digitalization advances have risen as an alternative tool for behavior characterization, as well as for the detection of changes over time. In particular, the generalized use of smartphones and electronic devices, which are continuously collecting data from the user, provide a representation of behavior in different areas of a person’s life, such as mobility, physical activity or social interactions. In addition, they allow us passive monitoring, that is, without the need for the user to directly interact with the device, collecting information in a unobtrusive manner and therefore without altering their daily routine. This methodology implies, among other advantages, that the user does not subjectively influence the information collected, obtaining objective representations of their behavior. This approach to the characterization and analysis of behavior and its changes has many applications, notably in medicine. In this work, we focus specifically on the field of mental health, where the characterization and early detection of behavioral changes is important in order to prevent relapses in psychiatric patients and, in particular, in those with a history of suicidal behavior to try to prevent possible suicide attempts or psychiatric emergency admissions.
Our approach is based on the development and application of mathematical and probabilistic models that can help us to detect these changes from passively collected data. However, despite the mentioned advantages, working with data collected through electronic devices and, specifically in a clinical scenario, is a challenge due to its characteristics. This data presents a very complex structure due to the following reasons. First of all, it is irregularly sampled in time (the samples can be stored every 5 minutes, when a specific activity starts or daily). Second, each observation can be heterogeneous, where by heterogeneous we mean that it is made up of several sources of different statistical data-type (continuous, discrete) or same type but, statistically, with different marginal distributions. In addition, the existence of several data sources and the irregular sampling frequency of the data collecting methods causes each day to be represented by a high-dimensional vector, focusing on the need for scalable algorithms. Lastly, these are data sequences with many missing values and very diverse patterns due to, for example, the lack of permissions on the phone, disconnection periods or, simply, the temporal irregularity already mentioned. The preprocessing of data with these characteristics requires a huge effort and time cost that is not feasible when dealing with such a demanding goal, as they are the prediction and prevention of suicide attempts, since the information must be processed in real time as every minute is important. Therefore, we need methods that are fast, efficient, accurate and adapted to the complexity of the data we are working with. For this reason, instead of focusing our efforts on data mining, which is generally conditioned to a specific initial hypothesis and hinders reproducibility, we work on methods that are capable of handling data sequences with the previously aforementioned characteristics, and do it in an online manner. That is, algorithms capable of processing the samples as they are being recorded.
In this thesis, we focus on the development of probabilistic models for behavioral change detection, proposing algorithms that can work on heterogeneous, multi-source, high-dimensional sequential data with missing values. In our scenario, we assume that the joint distribution of the data changes at a given moment, segmenting the sequence, and our goal is to detect this change and to do so with the least possible delay. We begin by describing the benefits of using digital phenotyping for the characterization of human behavior changes, and we introduce an example of a specific monitoring e-health system with which we have worked. We present two works on data mining in medicine through digital phenotype modelling: the prediction of disability level in different domains of daily life and the analysis of causal relationships between variables in order to detect negative effects caused by isolation during the COVID-19 pandemic in psychiatric patients. In the following -more technical- chapters, we go a step further, and change the focus: from
fully adapting our data to existing methods, to proposing algorithms that are specific for heterogeneous, multi-source, high-dimensional sequential data with missing values. We focus on the development of change point detection (CPD) algorithms and present the benefits of using latent variable models to deal with the problem of high-dimensional data sets, and provide methods that are able to integrate data from different statistical types.
We also present a flexible CPD model that works on local observation models (LOMs) defined based on the statistical type, source or previous knowledge of the initial data, generated from local discrete latent variable models. In this way, the information is transformed into homogeneous low-dimensional spaces, maintaining the benefits of the previously proposed algorithms but also allowing an equivalent level of treatment of all local representations, thus solving the initial problem of heterogeneity. In addition, different CPD factorization models are defined and adapted that weight the contribution of each local representation to the global detection following different approaches, holding for every previously proposed local observation models, and adding explainability on the degree of contribution of each local representation to the joint detection. We evaluate and test the proposed models on synthetic data, demonstrating an improvement in the precision and a reduction in the delay of the detection, proving their robustness against the presence of missing data. Finally, we apply some of these methods to a real data set within a study of behavioral change characterization in psychiatric patients with a history of suicide-related events. We propose individualized models for change detection over passively-sensed data via smartphones, and use suicide attempts and psychiatric emergency admissions as real labels with the aim of predicting them one week in advance.