Machine Learning Latest Submitted Preprints | 2019-07-16
Machine Learning
Batch-Shaped Channel Gated Networks (1907.06627v1)
Babak Ehteshami Bejnordi, Tijmen Blankevoort, Max Welling
2019-07-15
We present a method for gating deep-learning architectures on a fine-grained level. Individual convolutional maps are turned on/off conditionally on features in the network. This method allows us to train neural networks with a large capacity, but lower inference time than the full network. To achieve this, we introduce a new residual block architecture that gates convolutional channels in a fine-grained manner. We also introduce a generally applicable tool "batch-shaping" that matches the marginal aggregate posteriors of features in a neural network to a pre-specified prior distribution. We use this novel technique to force gates to be more conditional on the data. We present results on CIFAR-10 and ImageNet datasets for image classification and Cityscapes for semantic segmentation. Our results show that our method can slim down large architectures conditionally, such that the average computational cost on the data is on par with a smaller architecture, but with higher accuracy. In particular, our ResNet34 gated network achieves a performance of 72.55% top-1 accuracy compared to the 69.76% accuracy of the baseline ResNet18 model, for similar complexity. We also show that the resulting networks automatically learn to use more features for difficult examples and fewer features for simple examples.
Revealing posturographic features associated with the risk of falling in patients with Parkinsonian syndromes via machine learning (1907.06614v1)
Ioannis Bargiotas, Argyris Kalogeratos, Myrto Limnios, Pierre-Paul Vidal, Damien Ricard, Nicolas Vayatis
2019-07-15
Falling in Parkinsonian syndromes (PS) is associated with postural instability and consists a common cause of disability among PS patients. Current posturographic practices record the body's center-of-pressure displacement (statokinesigram) while the patient stands on a force platform. Statokinesigrams, after appropriate signal processing, can offer numerous posturographic features, which however challenges the efforts for valid statistics via standard univariate approaches. In this work, we present the ts-AUC, a non-parametric multivariate two-sample test, which we employ to analyze statokinesigram differences among PS patients that are fallers (PSf) and non-fallers (PSNF). We included 123 PS patients who were classified into PSF or PSNF based on clinical assessment and underwent simple Romberg Test (eyes open/eyes closed). We analyzed posturographic features using both multiple testing with p-value adjustment and the ts-AUC. While the ts-AUC showed significant difference between groups (p-value = 0.01), multiple testing did not show any such difference. Interestingly, significant difference between the two groups was found only using the open-eyes protocol. PSF showed significantly increased antero-posterior movements as well as increased posturographic area, compared to PSNF. Our study demonstrates the superiority of the ts-AUC test compared to standard statistical tools in distinguishing PSF and PSNF in the multidimensional feature space. This result highlights more generally the fact that machine learning-based statistical tests can be seen as a natural extension of classical statistical approaches and should be considered, especially when dealing with multifactorial assessments.
Agglomerative Attention (1907.06607v1)
Matthew Spellings
2019-07-15
Neural networks using transformer-based architectures have recently demonstrated great power and flexibility in modeling sequences of many types. One of the core components of transformer networks is the attention layer, which allows contextual information to be exchanged among sequence elements. While many of the prevalent network structures thus far have utilized full attention -- which operates on all pairs of sequence elements -- the quadratic scaling of this attention mechanism significantly constrains the size of models that can be trained. In this work, we present an attention model that has only linear requirements in memory and computation time. We show that, despite the simpler attention model, networks using this attention mechanism can attain comparable performance to full attention networks on language modeling tasks.
Medical Concept Representation Learning from Claims Data and Application to Health Plan Payment Risk Adjustment (1907.06600v1)
Qiu-Yue Zhong, Andrew H. Fairless, Jasmine M. McCammon, Farbod Rahmanian
2019-07-15
Risk adjustment has become an increasingly important tool in healthcare. It has been extensively applied to payment adjustment for health plans to reflect the expected cost of providing coverage for members. Risk adjustment models are typically estimated using linear regression, which does not fully exploit the information in claims data. Moreover, the development of such linear regression models requires substantial domain expert knowledge and computational effort for data preprocessing. In this paper, we propose a novel approach for risk adjustment that uses semantic embeddings to represent patient medical histories. Embeddings efficiently represent medical concepts learned from diagnostic, procedure, and prescription codes in patients' medical histories. This approach substantially reduces the need for feature engineering. Our results show that models using embeddings had better performance than a commercial risk adjustment model on the task of prospective risk score prediction.
Experimental machine learning quantum homodyne tomography (1907.06589v1)
E. S. Tiunov, V. V. Vyborova, A. E. Ulanov, A. I. Lvovsky, A. K. Fedorov
2019-07-15
Complete characterization of states and processes that occur within quantum devices is crucial for understanding and testing their potential to outperform classical technologies for communications and computing. However, this task becomes unwieldy for large and complex quantum systems. Here we realize and experimentally demonstrate a method for complete characterization of a harmonic oscillator based on an artificial neural network known as the restricted Boltzmann machine. We apply the method to experimental balanced homodyne tomography and show it to allow full estimation of quantum states based on a smaller amount of experimental data. Although our experiment is in the optical domain, our method provides a way of exploring quantum resources in a broad class of physical systems, such as superconducting circuits, atomic and molecular ensembles, and optomechanical systems.
Uncertainty-Aware Principal Component Analysis (1905.01127v2)
Jochen Görtler, Thilo Spinner, Dirk Streeb, Daniel Weiskopf, Oliver Deussen
2019-05-03
We present a technique to perform dimensionality reduction on data that is subject to uncertainty. Our method is a generalization of traditional principal component analysis (PCA) to multivariate probability distributions. In comparison to non-linear methods, linear dimensionality reduction techniques have the advantage that the characteristics of such probability distributions remain intact after projection. We derive a representation of the PCA sample covariance matrix that respects potential uncertainty in each of the inputs, building the mathematical foundation of our new method uncertainty-aware PCA. In addition to the accuracy and performance gained by our approach over sampling-based strategies, our formulation allows us to perform sensitivity analysis with regard to the uncertainty in the data. For this, we propose factor traces as a novel visualization that enables to better understand the influence of uncertainty on the chosen principal components. We provide multiple examples of our technique using real-world datasets. As a special case, we show how to propagate multivariate normal distributions through PCA in closed-form. Furthermore, we discuss extensions and limitations of our approach.
Efficient Video Generation on Complex Datasets (1907.06571v1)
Aidan Clark, Jeff Donahue, Karen Simonyan
2019-07-15
Generative models of natural images have progressed towards high fidelity samples by the strong leveraging of scale. We attempt to carry this success to the field of video modeling by showing that large Generative Adversarial Networks trained on the complex Kinetics-600 dataset are able to produce video samples of substantially higher complexity than previous work. Our proposed network, Dual Video Discriminator GAN (DVD-GAN), scales to longer and higher resolution videos by leveraging a computationally efficient decomposition of its discriminator. We evaluate on the related tasks of video synthesis and video prediction, and achieve new state of the art Frechet Inception Distance on prediction for Kinetics-600, as well as state of the art Inception Score for synthesis on the UCF-101 dataset, alongside establishing a number of strong baselines on Kinetics-600.
Budgeted Multi-Objective Optimization with a Focus on the Central Part of the Pareto Front -- Extended Version (1809.10482v4)
David Gaudrie, Rodolphe Le Riche, Victor Picheny, Benoit Enaux, Vincent Herbert
2018-09-27
Optimizing nonlinear systems involving expensive computer experiments with regard to conflicting objectives is a common challenge. When the number of experiments is severely restricted and/or when the number of objectives increases, uncovering the whole set of Pareto optimal solutions is out of reach, even for surrogate-based approaches: the proposed solutions are sub-optimal or do not cover the front well. As non-compromising optimal solutions have usually little point in applications, this work restricts the search to solutions that are close to the Pareto front center. The article starts by characterizing this center, which is defined for any type of front. Next, a Bayesian multi-objective optimization method for directing the search towards it is proposed. Targeting a subset of the Pareto front allows an improved optimality of the solutions and a better coverage of this zone, which is our main concern. A criterion for detecting convergence to the center is described. If the criterion is triggered, a widened central part of the Pareto front is targeted such that sufficiently accurate convergence to it is forecasted within the remaining budget. Numerical experiments show how the resulting algorithm, C-EHI, better locates the central part of the Pareto front when compared to state-of-the-art Bayesian algorithms.
Improved Hybrid Layered Image Compression using Deep Learning and Traditional Codecs (1907.06566v1)
Haisheng Fu, Feng Liang, Bo Lei, Nai Bian, Qian zhang, Mohammad Akbari, Jie Liang, Chengjie Tu
2019-07-15
Recently deep learning-based methods have been applied in image compression and achieved many promising results. In this paper, we propose an improved hybrid layered image compression framework by combining deep learning and the traditional image codecs. At the encoder, we first use a convolutional neural network (CNN) to obtain a compact representation of the input image, which is losslessly encoded by the FLIF codec as the base layer of the bit stream. A coarse reconstruction of the input is obtained by another CNN from the reconstructed compact representation. The residual between the input and the coarse reconstruction is then obtained and encoded by the H.265/HEVC-based BPG codec as the enhancement layer of the bit stream. Experimental results using the Kodak and Tecnick datasets show that the proposed scheme outperforms the state-of-the-art deep learning-based layered coding scheme and traditional codecs including BPG in both PSNR and MS-SSIM metrics across a wide range of bit rates, when the images are coded in the RGB444 domain.
Recovery Guarantees for Compressible Signals with Adversarial Noise (1907.06565v1)
Jasjeet Dhaliwal, Kyle Hambrook
2019-07-15
We provide recovery guarantees for compressible signals that have been corrupted with noise and extend the framework introduced in [1] to defend neural networks against -norm and -norm attacks. Concretely, for a signal that is approximately sparse in some transform domain and has been perturbed with noise, we provide guarantees for accurately recovering the signal in the transform domain. We can then use the recovered signal to reconstruct the signal in its original domain while largely removing the noise. Our results are general as they can be directly applied to most unitary transforms used in practice and hold for both -norm bounded noise and -norm bounded noise. In the case of -norm bounded noise, we prove recovery guarantees for Iterative Hard Thresholding (IHT) and Basis Pursuit (BP). For the case of -norm bounded noise, we provide recovery guarantees for BP. These guarantees theoretically bolster the defense framework introduced in [1] for defending neural networks against adversarial inputs. Finally, we experimentally demonstrate this defense framework using both IHT and BP against the One Pixel Attack [21], Carlini-Wagner and attacks [3], Jacobian Saliency Based attack [18], and the DeepFool attack [17] on CIFAR-10 [12], MNIST [13], and Fashion-MNIST [27] datasets. This expands beyond the experimental demonstrations of [1].