## 2016 |

## Journal Articles |

Valera, Isabel; Ruiz, Francisco; Perez-Cruz, Fernando Infinite Factorial Unbounded-State Hidden Markov Model (Journal Article) IEEE transactions on pattern analysis and machine intelligence, 38 (9), pp. 1816 – 1828, 2016, ISSN: 1939-3539. (Abstract | Links | BibTeX | Tags: Bayes methods, Bayesian nonparametrics, CASI CAM CM, Computational modeling, GAMMA-L+ UC3M, Gibbs sampling, Hidden Markov models, Inference algorithms, Journal, Markov processes, Probability distribution, reversible jump Markov chain Monte Carlo, slice sampling, Time series, variational inference, Yttrium) @article{Valera2016b, title = {Infinite Factorial Unbounded-State Hidden Markov Model}, author = {Valera, Isabel and Ruiz, Francisco J. R. and Perez-Cruz, Fernando}, url = {http://www.ncbi.nlm.nih.gov/pubmed/26571511 http://ieeexplore.ieee.org/xpl/articleDetails.jsp?reload=true&arnumber=7322279}, doi = {10.1109/TPAMI.2015.2498931}, issn = {1939-3539}, year = {2016}, date = {2016-09-01}, journal = {IEEE transactions on pattern analysis and machine intelligence}, volume = {38}, number = {9}, pages = {1816 -- 1828}, abstract = {There are many scenarios in artificial intelligence, signal processing or medicine, in which a temporal sequence consists of several unknown overlapping independent causes, and we are interested in accurately recovering those canonical causes. Factorial hidden Markov models (FHMMs) present the versatility to provide a good fit to these scenarios. However, in some scenarios, the number of causes or the number of states of the FHMM cannot be known or limited a priori. In this paper, we propose an infinite factorial unbounded-state hidden Markov model (IFUHMM), in which the number of parallel hidden Markov models (HMMs) and states in each HMM are potentially unbounded. We rely on a Bayesian nonparametric (BNP) prior over integer-valued matrices, in which the columns represent the Markov chains, the rows the time indexes, and the integers the state for each chain and time instant. First, we extend the existent infinite factorial binary-state HMM to allow for any number of states. Then, we modify this model to allow for an unbounded number of states and derive an MCMC-based inference algorithm that properly deals with the trade-off between the unbounded number of states and chains. We illustrate the performance of our proposed models in the power disaggregation problem.}, keywords = {Bayes methods, Bayesian nonparametrics, CASI CAM CM, Computational modeling, GAMMA-L+ UC3M, Gibbs sampling, Hidden Markov models, Inference algorithms, Journal, Markov processes, Probability distribution, reversible jump Markov chain Monte Carlo, slice sampling, Time series, variational inference, Yttrium}, pubstate = {published}, tppubtype = {article} } There are many scenarios in artificial intelligence, signal processing or medicine, in which a temporal sequence consists of several unknown overlapping independent causes, and we are interested in accurately recovering those canonical causes. Factorial hidden Markov models (FHMMs) present the versatility to provide a good fit to these scenarios. However, in some scenarios, the number of causes or the number of states of the FHMM cannot be known or limited a priori. In this paper, we propose an infinite factorial unbounded-state hidden Markov model (IFUHMM), in which the number of parallel hidden Markov models (HMMs) and states in each HMM are potentially unbounded. We rely on a Bayesian nonparametric (BNP) prior over integer-valued matrices, in which the columns represent the Markov chains, the rows the time indexes, and the integers the state for each chain and time instant. First, we extend the existent infinite factorial binary-state HMM to allow for any number of states. Then, we modify this model to allow for an unbounded number of states and derive an MCMC-based inference algorithm that properly deals with the trade-off between the unbounded number of states and chains. We illustrate the performance of our proposed models in the power disaggregation problem. |

Valera, Isabel; Ruiz, Francisco; Perez-Cruz, Fernando Infinite Factorial Unbounded-State Hidden Markov Model (Journal Article) IEEE transactions on pattern analysis and machine intelligence, To appear (99), pp. 1, 2016, ISSN: 1939-3539. (Abstract | Links | BibTeX | Tags: Bayes methods, Bayesian nonparametrics, CASI CAM CM, Computational modeling, GAMMA-L+ UC3M, Gibbs sampling, Hidden Markov models, Inference algorithms, Markov processes, Probability distribution, reversible jump Markov chain Monte Carlo, slice sampling, Time series, variational inference, Yttrium) @article{Valera2016, title = {Infinite Factorial Unbounded-State Hidden Markov Model}, author = {Valera, Isabel and Ruiz, Francisco J. R. and Perez-Cruz, Fernando}, url = {http://www.ncbi.nlm.nih.gov/pubmed/26571511 http://ieeexplore.ieee.org/xpl/articleDetails.jsp?reload=true&arnumber=7322279}, doi = {10.1109/TPAMI.2015.2498931}, issn = {1939-3539}, year = {2016}, date = {2016-01-01}, journal = {IEEE transactions on pattern analysis and machine intelligence}, volume = {To appear}, number = {99}, pages = {1}, abstract = {There are many scenarios in artificial intelligence, signal processing or medicine, in which a temporal sequence consists of several unknown overlapping independent causes, and we are interested in accurately recovering those canonical causes. Factorial hidden Markov models (FHMMs) present the versatility to provide a good fit to these scenarios. However, in some scenarios, the number of causes or the number of states of the FHMM cannot be known or limited a priori. In this paper, we propose an infinite factorial unbounded-state hidden Markov model (IFUHMM), in which the number of parallel hidden Markov models (HMMs) and states in each HMM are potentially unbounded. We rely on a Bayesian nonparametric (BNP) prior over integer-valued matrices, in which the columns represent the Markov chains, the rows the time indexes, and the integers the state for each chain and time instant. First, we extend the existent infinite factorial binary-state HMM to allow for any number of states. Then, we modify this model to allow for an unbounded number of states and derive an MCMC-based inference algorithm that properly deals with the trade-off between the unbounded number of states and chains. We illustrate the performance of our proposed models in the power disaggregation problem.}, keywords = {Bayes methods, Bayesian nonparametrics, CASI CAM CM, Computational modeling, GAMMA-L+ UC3M, Gibbs sampling, Hidden Markov models, Inference algorithms, Markov processes, Probability distribution, reversible jump Markov chain Monte Carlo, slice sampling, Time series, variational inference, Yttrium}, pubstate = {published}, tppubtype = {article} } There are many scenarios in artificial intelligence, signal processing or medicine, in which a temporal sequence consists of several unknown overlapping independent causes, and we are interested in accurately recovering those canonical causes. Factorial hidden Markov models (FHMMs) present the versatility to provide a good fit to these scenarios. However, in some scenarios, the number of causes or the number of states of the FHMM cannot be known or limited a priori. In this paper, we propose an infinite factorial unbounded-state hidden Markov model (IFUHMM), in which the number of parallel hidden Markov models (HMMs) and states in each HMM are potentially unbounded. We rely on a Bayesian nonparametric (BNP) prior over integer-valued matrices, in which the columns represent the Markov chains, the rows the time indexes, and the integers the state for each chain and time instant. First, we extend the existent infinite factorial binary-state HMM to allow for any number of states. Then, we modify this model to allow for an unbounded number of states and derive an MCMC-based inference algorithm that properly deals with the trade-off between the unbounded number of states and chains. We illustrate the performance of our proposed models in the power disaggregation problem. |

## 2015 |

## Journal Articles |

Moreno, Pablo; Teh, Yee Whye; Perez-Cruz, Fernando; Artés-Rodríguez, Antonio Bayesian Nonparametric Crowdsourcing (Journal Article) Journal of Machine Learning Research, 16 (August), pp. 1607–1627, 2015. (Abstract | Links | BibTeX | Tags: Bayesian nonparametrics, Dirichlet process, Gibbs sampling, Hierarchical clustering, Journal, Multiple annotators) @article{Moreno2015b, title = {Bayesian Nonparametric Crowdsourcing}, author = {Moreno, Pablo G. and Teh, Yee Whye and Perez-Cruz, Fernando and Artés-Rodríguez, Antonio}, url = {http://www.jmlr.org/papers/volume16/moreno15a/moreno15a.pdf}, year = {2015}, date = {2015-08-01}, journal = {Journal of Machine Learning Research}, volume = {16}, number = {August}, pages = {1607--1627}, abstract = {Crowdsourcing has been proven to be an effective and efficient tool to annotate large datasets. User annotations are often noisy, so methods to combine the annotations to produce reliable estimates of the ground truth are necessary. We claim that considering the existence of clusters of users in this combination step can improve the performance. This is especially important in early stages of crowdsourcing implementations, where the number of annotations is low. At this stage there is not enough information to accurately estimate the bias introduced by each annotator separately, so we have to resort to models that consider the statistical links among them. In addition, finding these clusters is interesting in itself as knowing the behavior of the pool of annotators allows implementing efficient active learning strategies. Based on this, we propose in this paper two new fully unsupervised models based on a Chinese Restaurant Process (CRP) prior and a hierarchical structure that allows inferring these groups jointly with the ground truth and the properties of the users. Efficient inference algorithms based on Gibbs sampling with auxiliary variables are proposed. Finally, we perform experiments, both on synthetic and real databases, to show the advantages of our models over state-of-the-art algorithms.}, keywords = {Bayesian nonparametrics, Dirichlet process, Gibbs sampling, Hierarchical clustering, Journal, Multiple annotators}, pubstate = {published}, tppubtype = {article} } Crowdsourcing has been proven to be an effective and efficient tool to annotate large datasets. User annotations are often noisy, so methods to combine the annotations to produce reliable estimates of the ground truth are necessary. We claim that considering the existence of clusters of users in this combination step can improve the performance. This is especially important in early stages of crowdsourcing implementations, where the number of annotations is low. At this stage there is not enough information to accurately estimate the bias introduced by each annotator separately, so we have to resort to models that consider the statistical links among them. In addition, finding these clusters is interesting in itself as knowing the behavior of the pool of annotators allows implementing efficient active learning strategies. Based on this, we propose in this paper two new fully unsupervised models based on a Chinese Restaurant Process (CRP) prior and a hierarchical structure that allows inferring these groups jointly with the ground truth and the properties of the users. Efficient inference algorithms based on Gibbs sampling with auxiliary variables are proposed. Finally, we perform experiments, both on synthetic and real databases, to show the advantages of our models over state-of-the-art algorithms. |

## 2010 |

## Journal Articles |

Martino, Luca; Miguez, Joaquin Generalized Rejection Sampling Schemes and Applications in Signal Processing (Journal Article) Signal Processing, 90 (11), pp. 2981–2995, 2010. (Abstract | Links | BibTeX | Tags: Adaptive rejection sampling, Gibbs sampling, Monte Carlo integration, Rejection sampling, sensor networks, Target localization) @article{Martino2010a, title = {Generalized Rejection Sampling Schemes and Applications in Signal Processing}, author = {Martino, Luca and Miguez, Joaquin}, url = {http://www.sciencedirect.com/science/article/pii/S0165168410001866}, year = {2010}, date = {2010-01-01}, journal = {Signal Processing}, volume = {90}, number = {11}, pages = {2981--2995}, abstract = {Bayesian methods and their implementations by means of sophisticated Monte Carlo techniques, such as Markov chain Monte Carlo (MCMC) and particle filters, have become very popular in signal processing over the last years. However, in many problems of practical interest these techniques demand procedures for sampling from probability distributions with non-standard forms, hence we are often brought back to the consideration of fundamental simulation algorithms, such as rejection sampling (RS). Unfortunately, the use of RS techniques demands the calculation of tight upper bounds for the ratio of the target probability density function (pdf) over the proposal density from which candidate samples are drawn. Except for the class of log-concave target pdf's, for which an efficient algorithm exists, there are no general methods to analytically determine this bound, which has to be derived from scratch for each specific case. In this paper, we introduce new schemes for (a) obtaining upper bounds for likelihood functions and (b) adaptively computing proposal densities that approximate the target pdf closely. The former class of methods provides the tools to easily sample from a posteriori probability distributions (that appear very often in signal processing problems) by drawing candidates from the prior distribution. However, they are even more useful when they are exploited to derive the generalized adaptive RS (GARS) algorithm introduced in the second part of the paper. The proposed GARS method yields a sequence of proposal densities that converge towards the target pdf and enable a very efficient sampling of a broad class of probability distributions, possibly with multiple modes and non-standard forms. We provide some simple numerical examples to illustrate the use of the proposed techniques, including an example of target localization using range measurements, often encountered in sensor network applications.}, keywords = {Adaptive rejection sampling, Gibbs sampling, Monte Carlo integration, Rejection sampling, sensor networks, Target localization}, pubstate = {published}, tppubtype = {article} } Bayesian methods and their implementations by means of sophisticated Monte Carlo techniques, such as Markov chain Monte Carlo (MCMC) and particle filters, have become very popular in signal processing over the last years. However, in many problems of practical interest these techniques demand procedures for sampling from probability distributions with non-standard forms, hence we are often brought back to the consideration of fundamental simulation algorithms, such as rejection sampling (RS). Unfortunately, the use of RS techniques demands the calculation of tight upper bounds for the ratio of the target probability density function (pdf) over the proposal density from which candidate samples are drawn. Except for the class of log-concave target pdf's, for which an efficient algorithm exists, there are no general methods to analytically determine this bound, which has to be derived from scratch for each specific case. In this paper, we introduce new schemes for (a) obtaining upper bounds for likelihood functions and (b) adaptively computing proposal densities that approximate the target pdf closely. The former class of methods provides the tools to easily sample from a posteriori probability distributions (that appear very often in signal processing problems) by drawing candidates from the prior distribution. However, they are even more useful when they are exploited to derive the generalized adaptive RS (GARS) algorithm introduced in the second part of the paper. The proposed GARS method yields a sequence of proposal densities that converge towards the target pdf and enable a very efficient sampling of a broad class of probability distributions, possibly with multiple modes and non-standard forms. We provide some simple numerical examples to illustrate the use of the proposed techniques, including an example of target localization using range measurements, often encountered in sensor network applications. |