Article accepted for publication in Neural Networks
Title Regularizing transformers with deep probabilistic layers Authors Aurora Cobo Aguilera, Pablo M. Olmos, Antonio Artés-Rodríguez and Fernando Pérez-Cruz Abstract Language models (LM) have grown non-stop in the last decade, from sequence-to-sequence architectures to attention-based Transformers. However, regularization is not deeply studied in those structures. In this work, we use a Gaussian Mixture Variational Autoencoder…