## 2012 |

## Journal Articles |

Leiva-Murillo, Jose M; Artés-Rodríguez, Antonio Algorithms for Maximum-Likelihood Bandwidth Selection in Kernel Density Estimators Journal Article Pattern Recognition Letters, 33 (13), pp. 1717–1724, 2012, ISSN: 01678655. Abstract | Links | BibTeX | Tags: Kernel density estimation, Multivariate density modeling, Pattern recognition @article{Leiva-Murillo2012, title = {Algorithms for Maximum-Likelihood Bandwidth Selection in Kernel Density Estimators}, author = {Leiva-Murillo, Jose M. and Artés-Rodríguez, Antonio}, url = {http://www.tsc.uc3m.es/~antonio/papers/P45_2012_Algorithms for Maximum Likelihood Bandwidth Selection in Kernel Density Estimators.pdf http://www.sciencedirect.com/science/article/pii/S0167865512001948}, issn = {01678655}, year = {2012}, date = {2012-01-01}, journal = {Pattern Recognition Letters}, volume = {33}, number = {13}, pages = {1717--1724}, publisher = {Elsevier Science Inc.}, abstract = {In machine learning and statistics, kernel density estimators are rarely used on multivariate data due to the difficulty of finding an appropriate kernel bandwidth to overcome overfitting. However, the recent advances on information-theoretic learning have revived the interest on these models. With this motivation, in this paper we revisit the classical statistical problem of data-driven bandwidth selection by cross-validation maximum likelihood for Gaussian kernels. We find a solution to the optimization problem under both the spherical and the general case where a full covariance matrix is considered for the kernel. The fixed-point algorithms proposed in this paper obtain the maximum likelihood bandwidth in few iterations, without performing an exhaustive bandwidth search, which is unfeasible in the multivariate case. The convergence of the methods proposed is proved. A set of classification experiments are performed to prove the usefulness of the obtained models in pattern recognition.}, keywords = {Kernel density estimation, Multivariate density modeling, Pattern recognition}, pubstate = {published}, tppubtype = {article} } In machine learning and statistics, kernel density estimators are rarely used on multivariate data due to the difficulty of finding an appropriate kernel bandwidth to overcome overfitting. However, the recent advances on information-theoretic learning have revived the interest on these models. With this motivation, in this paper we revisit the classical statistical problem of data-driven bandwidth selection by cross-validation maximum likelihood for Gaussian kernels. We find a solution to the optimization problem under both the spherical and the general case where a full covariance matrix is considered for the kernel. The fixed-point algorithms proposed in this paper obtain the maximum likelihood bandwidth in few iterations, without performing an exhaustive bandwidth search, which is unfeasible in the multivariate case. The convergence of the methods proposed is proved. A set of classification experiments are performed to prove the usefulness of the obtained models in pattern recognition. |

## 2011 |

## Journal Articles |

Santiago-Mozos, Ricardo ; Perez-Cruz, Fernando ; Artés-Rodríguez, Antonio Extended Input Space Support Vector Machine Journal Article IEEE transactions on neural networks / a publication of the IEEE Neural Networks Council, 22 (1), pp. 158–163, 2011, ISSN: 1941-0093. Abstract | Links | BibTeX | Tags: Algorithms, Artificial Intelligence, Automated, Automated: standards, Computer Simulation, Computer Simulation: standards, Neural Networks (Computer), Pattern recognition, Problem Solving, Software Design, Software Validation @article{Santiago-Mozos2011, title = {Extended Input Space Support Vector Machine}, author = {Santiago-Mozos, Ricardo and Perez-Cruz, Fernando and Artés-Rodríguez, Antonio}, url = {http://www.tsc.uc3m.es/~antonio/papers/P38_2011_Extended Input Space Support Vector Machine.pdf http://www.ncbi.nlm.nih.gov/pubmed/21095866}, issn = {1941-0093}, year = {2011}, date = {2011-01-01}, journal = {IEEE transactions on neural networks / a publication of the IEEE Neural Networks Council}, volume = {22}, number = {1}, pages = {158--163}, abstract = {In some applications, the probability of error of a given classifier is too high for its practical application, but we are allowed to gather more independent test samples from the same class to reduce the probability of error of the final decision. From the point of view of hypothesis testing, the solution is given by the Neyman-Pearson lemma. However, there is no equivalent result to the Neyman-Pearson lemma when the likelihoods are unknown, and we are given a training dataset. In this brief, we explore two alternatives. First, we combine the soft (probabilistic) outputs of a given classifier to produce a consensus labeling for K test samples. In the second approach, we build a new classifier that directly computes the label for K test samples. For this second approach, we need to define an extended input space training set and incorporate the known symmetries in the classifier. This latter approach gives more accurate results, as it only requires an accurate classification boundary, while the former needs an accurate posterior probability estimate for the whole input space. We illustrate our results with well-known databases.}, keywords = {Algorithms, Artificial Intelligence, Automated, Automated: standards, Computer Simulation, Computer Simulation: standards, Neural Networks (Computer), Pattern recognition, Problem Solving, Software Design, Software Validation}, pubstate = {published}, tppubtype = {article} } In some applications, the probability of error of a given classifier is too high for its practical application, but we are allowed to gather more independent test samples from the same class to reduce the probability of error of the final decision. From the point of view of hypothesis testing, the solution is given by the Neyman-Pearson lemma. However, there is no equivalent result to the Neyman-Pearson lemma when the likelihoods are unknown, and we are given a training dataset. In this brief, we explore two alternatives. First, we combine the soft (probabilistic) outputs of a given classifier to produce a consensus labeling for K test samples. In the second approach, we build a new classifier that directly computes the label for K test samples. For this second approach, we need to define an extended input space training set and incorporate the known symmetries in the classifier. This latter approach gives more accurate results, as it only requires an accurate classification boundary, while the former needs an accurate posterior probability estimate for the whole input space. We illustrate our results with well-known databases. |

## 2007 |

## Journal Articles |

Leiva-Murillo, Jose M; Artés-Rodríguez, Antonio Maximization of Mutual Information for Supervised Linear Feature Extraction Journal Article IEEE Transactions on Neural Networks, 18 (5), pp. 1433–1441, 2007, ISSN: 1045-9227. Abstract | Links | BibTeX | Tags: Algorithms, Artificial Intelligence, Automated, component-by-component gradient-ascent method, Computer Simulation, Data Mining, Entropy, Feature extraction, gradient methods, gradient-based entropy, Independent component analysis, Information Storage and Retrieval, information theory, Iron, learning (artificial intelligence), Linear discriminant analysis, Linear Models, Mutual information, Optimization methods, Pattern recognition, Reproducibility of Results, Sensitivity and Specificity, supervised linear feature extraction, Vectors @article{Leiva-Murillo2007, title = {Maximization of Mutual Information for Supervised Linear Feature Extraction}, author = {Leiva-Murillo, Jose M. and Artés-Rodríguez, Antonio}, url = {http://ieeexplore.ieee.org/articleDetails.jsp?arnumber=4298118}, issn = {1045-9227}, year = {2007}, date = {2007-01-01}, journal = {IEEE Transactions on Neural Networks}, volume = {18}, number = {5}, pages = {1433--1441}, publisher = {IEEE}, abstract = {In this paper, we present a novel scheme for linear feature extraction in classification. The method is based on the maximization of the mutual information (MI) between the features extracted and the classes. The sum of the MI corresponding to each of the features is taken as an heuristic that approximates the MI of the whole output vector. Then, a component-by-component gradient-ascent method is proposed for the maximization of the MI, similar to the gradient-based entropy optimization used in independent component analysis (ICA). The simulation results show that not only is the method competitive when compared to existing supervised feature extraction methods in all cases studied, but it also remarkably outperform them when the data are characterized by strongly nonlinear boundaries between classes.}, keywords = {Algorithms, Artificial Intelligence, Automated, component-by-component gradient-ascent method, Computer Simulation, Data Mining, Entropy, Feature extraction, gradient methods, gradient-based entropy, Independent component analysis, Information Storage and Retrieval, information theory, Iron, learning (artificial intelligence), Linear discriminant analysis, Linear Models, Mutual information, Optimization methods, Pattern recognition, Reproducibility of Results, Sensitivity and Specificity, supervised linear feature extraction, Vectors}, pubstate = {published}, tppubtype = {article} } In this paper, we present a novel scheme for linear feature extraction in classification. The method is based on the maximization of the mutual information (MI) between the features extracted and the classes. The sum of the MI corresponding to each of the features is taken as an heuristic that approximates the MI of the whole output vector. Then, a component-by-component gradient-ascent method is proposed for the maximization of the MI, similar to the gradient-based entropy optimization used in independent component analysis (ICA). The simulation results show that not only is the method competitive when compared to existing supervised feature extraction methods in all cases studied, but it also remarkably outperform them when the data are characterized by strongly nonlinear boundaries between classes. |