Machine Learning Latest Submitted Preprints | 2019-06-24
Machine Learning
Power and limitations of conformal martingales (1906.09256v1)
Vladimir Vovk
2019-06-21
This paper, accompanying my poster at ISIPTA 2019 (5 July 2019), poses the problem of investigating the power and limitations of conformal martingales as a means of detecting deviations from randomness. It starts from a brief review of conformal change detection, including CUSUM and Shiryaev-Roberts versions, and establishing simple validity results limiting the frequency of false alarms. It then gives several preliminary results in the direction of efficiency and discusses connections between randomness, exchangeability, and conformal martingales.
Max-Affine Regression: Provable, Tractable, and Near-Optimal Statistical Estimation (1906.09255v1)
Avishek Ghosh, Ashwin Pananjady, Adityanand Guntuboyina, Kannan Ramchandran
2019-06-21
Max-affine regression refers to a model where the unknown regression function is modeled as a maximum of unknown affine functions for a fixed . This generalizes linear regression and (real) phase retrieval, and is closely related to convex regression. Working within a non-asymptotic framework, we study this problem in the high-dimensional setting assuming that is a fixed constant, and focus on estimation of the unknown coefficients of the affine functions underlying the model. We analyze a natural alternating minimization (AM) algorithm for the non-convex least squares objective when the design is random. We show that the AM algorithm, when initialized suitably, converges with high probability and at a geometric rate to a small ball around the optimal coefficients. In order to initialize the algorithm, we propose and analyze a combination of a spectral method and a random search scheme in a low-dimensional space, which may be of independent interest. The final rate that we obtain is near-parametric and minimax optimal (up to a poly-logarithmic factor) as a function of the dimension, sample size, and noise variance. In that sense, our approach should be viewed as a direct and implementable method of enforcing regularization to alleviate the curse of dimensionality in problems of the convex regression type. As a by-product of our analysis, we also obtain guarantees on a classical algorithm for the phase retrieval problem under considerably weaker assumptions on the design distribution than was previously known. Numerical experiments illustrate the sharpness of our bounds in the various problem parameters.
Privacy Preserving QoE Modeling using Collaborative Learning (1906.09248v1)
Selim Ickin, Konstantinos Vandikas, Markus Fiedler
2019-06-21
Machine Learning based Quality of Experience (QoE) models potentially suffer from over-fitting due to limitations including low data volume, and limited participant profiles. This prevents models from becoming generic. Consequently, these trained models may under-perform when tested outside the experimented population. One reason for the limited datasets, which we refer in this paper as small QoE data lakes, is due to the fact that often these datasets potentially contain user sensitive information and are only collected throughout expensive user studies with special user consent. Thus, sharing of datasets amongst researchers is often not allowed. In recent years, privacy preserving machine learning models have become important and so have techniques that enable model training without sharing datasets but instead relying on secure communication protocols. Following this trend, in this paper, we present Round-Robin based Collaborative Machine Learning model training, where the model is trained in a sequential manner amongst the collaborated partner nodes. We benchmark this work using our customized Federated Learning mechanism as well as conventional Centralized and Isolated Learning methods.
Learning from weakly dependent data under Dobrushin's condition (1906.09247v1)
Yuval Dagan, Constantinos Daskalakis, Nishanth Dikkala, Siddhartha Jayanti
2019-06-21
Statistical learning theory has largely focused on learning and generalization given independent and identically distributed (i.i.d.) samples. Motivated by applications involving time-series data, there has been a growing literature on learning and generalization in settings where data is sampled from an ergodic process. This work has also developed complexity measures, which appropriately extend the notion of Rademacher complexity to bound the generalization error and learning rates of hypothesis classes in this setting. Rather than time-series data, our work is motivated by settings where data is sampled on a network or a spatial domain, and thus do not fit well within the framework of prior work. We provide learning and generalization bounds for data that are complexly dependent, yet their distribution satisfies the standard Dobrushin's condition. Indeed, we show that the standard complexity measures of Gaussian and Rademacher complexities and VC dimension are sufficient measures of complexity for the purposes of bounding the generalization error and learning rates of hypothesis classes in our setting. Moreover, our generalization bounds only degrade by constant factors compared to their i.i.d. analogs, and our learnability bounds degrade by log factors in the size of the training set.
Using Pre-Training Can Improve Model Robustness and Uncertainty (1901.09960v4)
Dan Hendrycks, Kimin Lee, Mantas Mazeika
2019-01-28
He et al. (2018) have called into question the utility of pre-training by showing that training from scratch can often yield similar performance to pre-training. We show that although pre-training may not improve performance on traditional classification metrics, it improves model robustness and uncertainty estimates. Through extensive experiments on adversarial examples, label corruption, class imbalance, out-of-distribution detection, and confidence calibration, we demonstrate large gains from pre-training and complementary effects with task-specific methods. We introduce adversarial pre-training and show approximately a 10% absolute improvement over the previous state-of-the-art in adversarial robustness. In some cases, using pre-training without task-specific methods also surpasses the state-of-the-art, highlighting the need for pre-training when evaluating future methods on robustness and uncertainty tasks.
Representative Task Self-selection for Flexible Clustered Lifelong Learning (1903.02173v2)
Gan Sun, Yang Cong, Qianqian Wang, Bineng Zhong, Yun Fu
2019-03-06
Consider the lifelong machine learning paradigm whose objective is to learn a sequence of tasks depending on previous experiences, e.g., knowledge library or deep network weights. However, the knowledge libraries or deep networks for most recent lifelong learning models are with prescribed size, and can degenerate the performance for both learned tasks and coming ones when facing with a new task environment (cluster). To address this challenge, we propose a novel incremental clustered lifelong learning framework with two knowledge libraries: feature learning library and model knowledge library, called Flexible Clustered Lifelong Learning (FCL3). Specifically, the feature learning library modeled by an autoencoder architecture maintains a set of representation common across all the observed tasks, and the model knowledge library can be self-selected by identifying and adding new representative models (clusters). When a new task arrives, our proposed FCL3model firstly transfers knowledge from these libraries to encode the new task, i.e.,effectively and selectively soft-assigning this new task to multiple representative models over feature learning library. Then, 1) the new task with a higher outlier probability will be judged as a new representative, and used to redefine both feature learning library and representative models over time; or 2) the new task with lower outlier probability will only refine the feature learning library. For model optimization, we cast this lifelong learning problem as an alternating direction minimization problem as a new task comes. Finally, we evaluate the proposed framework by analyzing several multi-task datasets, and the experimental results demonstrate that our FCL3 model can achieve better performance than most lifelong learning frameworks, even batch clustered multi-task learning models.
On Tree-based Methods for Similarity Learning (1906.09243v1)
Stéphan Clémençon, Robin Vogel
2019-06-21
In many situations, the choice of an adequate similarity measure or metric on the feature space dramatically determines the performance of machine learning methods. Building automatically such measures is the specific purpose of metric/similarity learning. In Vogel et al. (2018), similarity learning is formulated as a pairwise bipartite ranking problem: ideally, the larger the probability that two observations in the feature space belong to the same class (or share the same label), the higher the similarity measure between them. From this perspective, the ROC curve is an appropriate performance criterion and it is the goal of this article to extend recursive tree-based ROC optimization techniques in order to propose efficient similarity learning algorithms. The validity of such iterative partitioning procedures in the pairwise setting is established by means of results pertaining to the theory of U-processes and from a practical angle, it is discussed at length how to implement them by means of splitting rules specifically tailored to the similarity learning task. Beyond these theoretical/methodological contributions, numerical experiments are displayed and provide strong empirical evidence of the performance of the algorithmic approaches we propose.
Advanced Customer Activity Prediction based on Deep Hierarchic Encoder-Decoders (1904.07687v4)
Andrei Damian, Laurentiu Piciu, Sergiu Turlea, Nicolae Tapus
2019-04-11
Product recommender systems and customer profiling techniques have always been a priority in online retail. Recent machine learning research advances and also wide availability of massive parallel numerical computing has enabled various approaches and directions of recommender systems advancement. Worth to mention is the fact that in past years multiple traditional "offline" retail business are gearing more and more towards employing inferential and even predictive analytics both to stock-related problems such as predictive replenishment but also to enrich customer interaction experience. One of the most important areas of recommender systems research and development is that of Deep Learning based models which employ representational learning to model consumer behavioral patterns. Current state of the art in Deep Learning based recommender systems uses multiple approaches ranging from already classical methods such as the ones based on learning product representation vector, to recurrent analysis of customer transactional time-series and up to generative models based on adversarial training. Each of these methods has multiple advantages and inherent weaknesses such as inability of understanding the actual user-journey, ability to propose only single product recommendation or top-k product recommendations without prediction of actual next-best-offer. In our work we will present a new and innovative architectural approach of applying state-of-the-art hierarchical multi-module encoder-decoder architecture in order to solve several of current state-of-the-art recommender systems issues. Our approach will also produce by-products such as product need-based segmentation and customer behavioral segmentation - all in an end-to-end trainable approach. Finally, we will present a couple methods that solve known retail & distribution pain-points based on the proposed architecture.
Shaping Belief States with Generative Environment Models for RL (1906.09237v1)
Karol Gregor, Danilo Jimenez Rezende, Frederic Besse, Yan Wu, Hamza Merzic, Aaron van den Oord
2019-06-21
When agents interact with a complex environment, they must form and maintain beliefs about the relevant aspects of that environment. We propose a way to efficiently train expressive generative models in complex environments. We show that a predictive algorithm with an expressive generative model can form stable belief-states in visually rich and dynamic 3D environments. More precisely, we show that the learned representation captures the layout of the environment as well as the position and orientation of the agent. Our experiments show that the model substantially improves data-efficiency on a number of reinforcement learning (RL) tasks compared with strong model-free baseline agents. We find that predicting multiple steps into the future (overshooting), in combination with an expressive generative model, is critical for stable representations to emerge. In practice, using expressive generative models in RL is computationally expensive and we propose a scheme to reduce this computational burden, allowing us to build agents that are competitive with model-free baselines.
Theory of the Frequency Principle for General Deep Neural Networks (1906.09235v1)
Tao Luo, Zheng Ma, Zhi-Qin John Xu, Yaoyu Zhang
2019-06-21
Along with fruitful applications of Deep Neural Networks (DNNs) to realistic problems, recently, some empirical studies of DNNs reported a universal phenomenon of Frequency Principle (F-Principle): a DNN tends to learn a target function from low to high frequencies during the training. The F-Principle has been very useful in providing both qualitative and quantitative understandings of DNNs. In this paper, we rigorously investigate the F-Principle for the training dynamics of a general DNN at three stages: initial stage, intermediate stage, and final stage. For each stage, a theorem is provided in terms of proper quantities characterizing the F-Principle. Our results are general in the sense that they work for multilayer networks with general activation functions, population densities of data, and a large class of loss functions. Our work lays a theoretical foundation of the F-Principle for a better understanding of the training process of DNNs.