Machine Learning Latest Submitted Preprints | 2019-07-03

in #machine5 years ago

Machine Learning


Efficient Algorithms for Smooth Minimax Optimization (1907.01543v1)

Kiran Koshy Thekumparampil, Prateek Jain, Praneeth Netrapalli, Sewoong Oh

2019-07-02

This paper studies first order methods for solving smooth minimax optimization problems where is smooth and is concave for each . In terms of , we consider two settings -- strongly convex and nonconvex -- and improve upon the best known rates in both. For strongly-convex , we propose a new algorithm combining Mirror-Prox and Nesterov's AGD, and show that it can find global optimum in iterations, improving over current state-of-the-art rate of . We use this result along with an inexact proximal point method to provide rate for finding stationary points in the nonconvex setting where can be nonconvex. This improves over current best-known rate of . Finally, we instantiate our result for finite nonconvex minimax problems, i.e., , with nonconvex , to obtain convergence rate of total gradient evaluations for finding a stationary point.

"Machine LLRning": Learning to Softly Demodulate (1907.01512v1)

Ori Shental, Jakob Hoydis

2019-07-02

Soft demodulation, or demapping, of received symbols back into their conveyed soft bits, or bit log-likelihood ratios (LLRs), is at the very heart of any modern receiver. In this paper, a trainable universal neural network-based demodulator architecture, dubbed "LLRnet", is introduced. LLRnet facilitates an improved performance with significantly reduced overall computational complexity. For instance for the commonly used quadrature amplitude modulation (QAM), LLRnet demonstrates LLR estimates approaching the optimal log maximum a-posteriori inference with an order of magnitude less operations than that of the straightforward exact implementation. Link-level simulation examples for the application of LLRnet to 5G-NR and DVB-S.2 are provided. LLRnet is a (yet another) powerful example for the usefulness of applying machine learning to physical layer design.

Best k-layer neural network approximations (1907.01507v1)

Lek-Heng Lim, Mateusz Michalek, Yang Qi

2019-07-02

We investigate the geometry of the empirical risk minimization problem for -layer neural networks. We will provide examples showing that for the classical activation functions and , there exists a positive-measured subset of target functions that do not have best approximations by a fixed number of layers of neural networks. In addition, we study in detail the properties of shallow networks, classifying cases when a best -layer neural network approximation always exists or does not exist for the ReLU activation . We also determine the dimensions of shallow ReLU-activated networks.

Seismic data denoising and deblending using deep learning (1907.01497v1)

Alan Richardson, Caelen Feller

2019-07-02

An important step of seismic data processing is removing noise, including interference due to simultaneous and blended sources, from the recorded data. Traditional methods are time-consuming to apply as they often require manual choosing of parameters to obtain good results. We use deep learning, with a U-net model incorporating a ResNet architecture pretrained on ImageNet and further trained on synthetic seismic data, to perform this task. The method is applied to common offset gathers, with adjacent offset gathers of the gather being denoised provided as additional input channels. Here we show that this approach leads to a method that removes noise from several datasets recorded in different parts of the world with moderate success. We find that providing three adjacent offset gathers on either side of the gather being denoised is most effective. As this method does not require parameters to be chosen, it is more automated than traditional methods.

An innovative adaptive kriging approach for efficient binary classification of mechanical problems (1907.01490v1)

Jan N. Fuhg, Amelie Fau

2019-07-02

Kriging is an efficient machine-learning tool, which allows to obtain an approximate response of an investigated phenomenon on the whole parametric space. Adaptive schemes provide a the ability to guide the experiment yielding new sample point positions to enrich the metamodel. Herein a novel adaptive scheme called Monte Carlo-intersite Voronoi (MiVor) is proposed to efficiently identify binary decision regions on the basis of a regression surrogate model. The performance of the innovative approach is tested for analytical functions as well as some mechanical problems and is furthermore compared to two regression-based adaptive schemes. For smooth problems, all three methods have comparable performances. For highly fluctuating response surface as encountered e.g. for dynamics or damage problems, the innovative MiVor algorithm performs very well and provides accurate binary classification with only a few observation points.

Obj-GloVe: Scene-Based Contextual Object Embedding (1907.01478v1)

Canwen Xu, Zhenzhong Chen, Chenliang Li

2019-07-02

Recently, with the prevalence of large-scale image dataset, the co-occurrence information among classes becomes rich, calling for a new way to exploit it to facilitate inference. In this paper, we propose Obj-GloVe, a generic scene-based contextual embedding for common visual objects, where we adopt the word embedding method GloVe to exploit the co-occurrence between entities. We train the embedding on pre-processed Open Images V4 dataset and provide extensive visualization and analysis by dimensionality reduction and projecting the vectors along a specific semantic axis, and showcasing the nearest neighbors of the most common objects. Furthermore, we reveal the potential applications of Obj-GloVe on object detection and text-to-image synthesis, then verify its effectiveness on these two applications respectively.

Generalizing from a few environments in safety-critical reinforcement learning (1907.01475v1)

Zachary Kenton, Angelos Filos, Owain Evans, Yarin Gal

2019-07-02

Before deploying autonomous agents in the real world, we need to be confident they will perform safely in novel situations. Ideally, we would expose agents to a very wide range of situations during training, allowing them to learn about every possible danger, but this is often impractical. This paper investigates safety and generalization from a limited number of training environments in deep reinforcement learning (RL). We find RL algorithms can fail dangerously on unseen test environments even when performing perfectly on training environments. Firstly, in a gridworld setting, we show that catastrophes can be significantly reduced with simple modifications, including ensemble model averaging and the use of a blocking classifier. In the more challenging CoinRun environment we find similar methods do not significantly reduce catastrophes. However, we do find that the uncertainty information from the ensemble is useful for predicting whether a catastrophe will occur within a few steps and hence whether human intervention should be requested.

Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling (1806.10175v4)

Shanshan Wu, Alexandros G. Dimakis, Sujay Sanghavi, Felix X. Yu, Daniel Holtmann-Rice, Dmitry Storcheus, Afshin Rostamizadeh, Sanjiv Kumar

2018-06-26

Linear encoding of sparse vectors is widely popular, but is commonly data-independent -- missing any possible extra (but a priori unknown) structure beyond sparsity. In this paper we present a new method to learn linear encoders that adapt to data, while still performing well with the widely used decoder. The convex decoder prevents gradient propagation as needed in standard gradient-based training. Our method is based on the insight that unrolling the convex decoder into projected subgradient steps can address this issue. Our method can be seen as a data-driven way to learn a compressed sensing measurement matrix. We compare the empirical performance of 10 algorithms over 6 sparse datasets (3 synthetic and 3 real). Our experiments show that there is indeed additional structure beyond sparsity in the real datasets; our method is able to discover it and exploit it to create excellent reconstructions with fewer measurements (by a factor of 1.1-3x) compared to the previous state-of-the-art methods. We illustrate an application of our method in learning label embeddings for extreme multi-label classification, and empirically show that our method is able to match or outperform the precision scores of SLEEC, which is one of the state-of-the-art embedding-based approaches.

Augmenting Self-attention with Persistent Memory (1907.01470v1)

Sainbayar Sukhbaatar, Edouard Grave, Guillaume Lample, Herve Jegou, Armand Joulin

2019-07-02

Transformer networks have lead to important progress in language modeling and machine translation. These models include two consecutive modules, a feed-forward layer and a self-attention layer. The latter allows the network to capture long term dependencies and are often regarded as the key ingredient in the success of Transformers. Building upon this intuition, we propose a new model that solely consists of attention layers. More precisely, we augment the self-attention layers with persistent memory vectors that play a similar role as the feed-forward layer. Thanks to these vectors, we can remove the feed-forward layer without degrading the performance of a transformer. Our evaluation shows the benefits brought by our model on standard character and word level language modeling benchmarks.

Reproducibility in Machine Learning for Health (1907.01463v1)

Matthew B. A. McDermott, Shirly Wang, Nikki Marinsek, Rajesh Ranganath, Marzyeh Ghassemi, Luca Foschini

2019-07-02

Machine learning algorithms designed to characterize, monitor, and intervene on human health (ML4H) are expected to perform safely and reliably when operating at scale, potentially outside strict human supervision. This requirement warrants a stricter attention to issues of reproducibility than other fields of machine learning. In this work, we conduct a systematic evaluation of over 100 recently published ML4H research papers along several dimensions related to reproducibility. We find that the field of ML4H compares poorly to more established machine learning fields, particularly concerning data and code accessibility. Finally, drawing from success in other fields of science, we propose recommendations to data providers, academic publishers, and the ML4H research community in order to promote reproducible research moving forward.