Artificial Intelligence Preprint | 2019-07-08

in #artificial5 years ago

Artificial Intelligence


Fully Distributed Bayesian Optimization with Stochastic Policies (1902.09992v2)

Javier Garcia-Barcos, Ruben Martinez-Cantin

2019-02-26

Bayesian optimization has become a popular method for high-throughput computing, like the design of computer experiments or hyperparameter tuning of expensive models, where sample efficiency is mandatory. In these applications, distributed and scalable architectures are a necessity. However, Bayesian optimization is mostly sequential. Even parallel variants require certain computations between samples, limiting the parallelization bandwidth. Thompson sampling has been previously applied for distributed Bayesian optimization. But, when compared with other acquisition functions in the sequential setting, Thompson sampling is known to perform suboptimally. In this paper, we present a new method for fully distributed Bayesian optimization, which can be combined with any acquisition function. Our approach considers Bayesian optimization as a partially observable Markov decision process. In this context, stochastic policies, such as the Boltzmann policy, have some interesting properties which can also be studied for Bayesian optimization. Furthermore, the Boltzmann policy trivially allows a distributed Bayesian optimization implementation with high level of parallelism and scalability. We present results in several benchmarks and applications that shows the performance of our method.

Compositional Fairness Constraints for Graph Embeddings (1905.10674v3)

Avishek Joey Bose, William L. Hamilton

2019-05-25

Learning high-quality node embeddings is a key building block for machine learning models that operate on graph data, such as social networks and recommender systems. However, existing graph embedding techniques are unable to cope with fairness constraints, e.g., ensuring that the learned representations do not correlate with certain attributes, such as age or gender. Here, we introduce an adversarial framework to enforce fairness constraints on graph embeddings. Our approach is compositional---meaning that it can flexibly accommodate different combinations of fairness constraints during inference. For instance, in the context of social recommendations, our framework would allow one user to request that their recommendations are invariant to both their age and gender, while also allowing another user to request invariance to just their age. Experiments on standard knowledge graph and recommender system benchmarks highlight the utility of our proposed framework.

On Inductive Biases in Deep Reinforcement Learning (1907.02908v1)

Matteo Hessel, Hado van Hasselt, Joseph Modayil, David Silver

2019-07-05

Many deep reinforcement learning algorithms contain inductive biases that sculpt the agent's objective and its interface to the environment. These inductive biases can take many forms, including domain knowledge and pretuned hyper-parameters. In general, there is a trade-off between generality and performance when algorithms use such biases. Stronger biases can lead to faster learning, but weaker biases can potentially lead to more general algorithms. This trade-off is important because inductive biases are not free; substantial effort may be required to obtain relevant domain knowledge or to tune hyper-parameters effectively. In this paper, we re-examine several domain-specific components that bias the objective and the environmental interface of common deep reinforcement learning agents. We investigated whether the performance deteriorates when these components are replaced with adaptive solutions from the literature. In our experiments, performance sometimes decreased with the adaptive components, as one might expect when comparing to components crafted for the domain, but sometimes the adaptive components performed better. We investigated the main benefit of having fewer domain-specific components, by comparing the learning performance of the two systems on a different set of continuous control problems, without additional tuning of either system. As hypothesized, the system with adaptive components performed better on many of the new tasks.

Invariant Risk Minimization (1907.02893v1)

Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, David Lopez-Paz

2019-07-05

We introduce Invariant Risk Minimization (IRM), a learning paradigm to estimate invariant correlations across multiple training distributions. To achieve this goal, IRM learns a data representation such that the optimal classifier, on top of that data representation, matches for all training distributions. Through theory and experiments, we show how the invariances learned by IRM relate to the causal structures governing the data and enable out-of-distribution generalization.

Attentive Multi-Task Deep Reinforcement Learning (1907.02874v1)

Timo Bram, Gino Brunner, Oliver Richter, Roger Wattenhofer

2019-07-05

Sharing knowledge between tasks is vital for efficient learning in a multi-task setting. However, most research so far has focused on the easier case where knowledge transfer is not harmful, i.e., where knowledge from one task cannot negatively impact the performance on another task. In contrast, we present an approach to multi-task deep reinforcement learning based on attention that does not require any a-priori assumptions about the relationships between tasks. Our attention network automatically groups task knowledge into sub-networks on a state level granularity. It thereby achieves positive knowledge transfer if possible, and avoids negative transfer in cases where tasks interfere. We test our algorithm against two state-of-the-art multi-task/transfer learning approaches and show comparable or superior performance while requiring fewer network parameters.

Learning compositionally through attentive guidance (1805.09657v4)

Dieuwke Hupkes, Anand Singh, Kris Korrel, German Kruszewski, Elia Bruni

2018-05-20

While neural network models have been successfully applied to domains that require substantial generalisation skills, recent studies have implied that they struggle when solving the task they are trained on requires inferring its underlying compositional structure. In this paper, we introduce Attentive Guidance, a mechanism to direct a sequence to sequence model equipped with attention to find more compositional solutions. We test it on two tasks, devised precisely to assess the compositional capabilities of neural models, and we show that vanilla sequence to sequence models with attention overfit the training distribution, while the guided versions come up with compositional solutions that fit the training and testing distributions almost equally well. Moreover, the learned solutions generalise even in cases where the training and testing distributions strongly diverge. In this way, we demonstrate that sequence to sequence models are capable of finding compositional solutions without requiring extra components. These results helps to disentangle the causes for the lack of systematic compositionality in neural networks, which can in turn fuel future work.

Whale swarm algorithm with the mechanism of identifying and escaping from extreme points for multimodal function optimization (1804.02851v3)

Bing Zeng, Xinyu Li, Liang Gao, Yuyan Zhang, Haozhen Dong

2018-04-09

Most real-world optimization problems often come with multiple global optima or local optima. Therefore, increasing niching metaheuristic algorithms, which devote to finding multiple optima in a single run, are developed to solve these multimodal optimization problems. However, there are two difficulties urgently to be solved for most existing niching metaheuristic algorithms: how to set the optimal values of niching parameters for different optimization problems, and how to jump out of the local optima efficiently. These two difficulties limited their practicality largely. Based on Whale Swarm Algorithm (WSA) we proposed previously, this paper presents a new multimodal optimizer named WSA with Iterative Counter (WSA-IC) to address these two difficulties. In the one hand, WSA-IC improves the iteration rule of the original WSA for multimodal optimization, which removes the need of specifying different values of attenuation coefficient for different problems to form multiple subpopulations, without introducing any niching parameter. In the other hand, WSA-IC enables the identification of extreme point during iterations relying on two new parameters (i.e., stability threshold Ts and fitness threshold Tf), to jump out of the located extreme point. Moreover, the convergence of WSA-IC is proved. Finally, the proposed WSA-IC is compared with several niching metaheuristic algorithms on CEC2015 niching benchmark test functions and five additional classical multimodal functions with high dimensions. The experimental results demonstrate that WSA-IC statistically outperforms other niching metaheuristic algorithms on most test functions.

A Road-map Towards Explainable Question Answering A Solution for Information Pollution (1907.02606v1)

Saeedeh Shekarpour, Faisal Alshargi

2019-07-04

The increasing rate of information pollution on the Web requires novel solutions to tackle that. Question Answering (QA) interfaces are simplified and user-friendly interfaces to access information on the Web. However, similar to other AI applications, they are black boxes which do not manifest the details of the learning or reasoning steps for augmenting an answer. The Explainable Question Answering (XQA) system can alleviate the pain of information pollution where it provides transparency to the underlying computational model and exposes an interface enabling the end-user to access and validate provenance, validity, context, circulation, interpretation, and feedbacks of information. This position paper sheds light on the core concepts, expectations, and challenges in favor of the following questions (i) What is an XQA system?, (ii) Why do we need XQA?, (iii) When do we need XQA? (iv) How to represent the explanations? (iv) How to evaluate XQA systems?

Explaining Predictions from Tree-based Boosting Ensembles (1907.02582v1)

Ana Lucic, Hinda Haned, Maarten de Rijke

2019-07-04

Understanding how "black-box" models arrive at their predictions has sparked significant interest from both within and outside the AI community. Our work focuses on doing this by generating local explanations about individual predictions for tree-based ensembles, specifically Gradient Boosting Decision Trees (GBDTs). Given a correctly predicted instance in the training set, we wish to generate a counterfactual explanation for this instance, that is, the minimal perturbation of this instance such that the prediction flips to the opposite class. Most existing methods for counterfactual explanations are (1) model-agnostic, so they do not take into account the structure of the original model, and/or (2) involve building a surrogate model on top of the original model, which is not guaranteed to represent the original model accurately. There exists a method specifically for random forests; we wish to extend this method for GBDTs. This involves accounting for (1) the sequential dependency between trees and (2) training on the negative gradients instead of the original labels.

X-BERT: eXtreme Multi-label Text Classification with BERT (1905.02331v2)

Wei-Cheng Chang, Hsiang-Fu Yu, Kai Zhong, Yiming Yang, Inderjit Dhillon

2019-05-07

Extreme multi-label text classification (XMC) aims to tag each input text with the most relevant labels from an extremely large label set, such as those that arise in product categorization and e-commerce recommendation. Recently, pretrained language representation models such as BERT achieve remarkable state-of-the-art performance across a wide range of NLP tasks including sentence classification among small label sets (typically fewer than thousands). Indeed, there are several challenges in applying BERT to the XMC problem. The main challenges are: (i) the difficulty of capturing dependencies and correlations among labels, whose features may come from heterogeneous sources, and (ii) the tractability to scale to the extreme label setting as the model size can be very large and scale linearly with the size of the output space. To overcome these challenges, we propose X-BERT, the first feasible attempt to finetune BERT models for a scalable solution to the XMC problem. Specifically, X-BERT leverages both the label and document text to build label representations, which induces semantic label clusters in order to better model label dependencies. At the heart of X-BERT is finetuning BERT models to capture the contextual relations between input text and the induced label clusters. Finally, an ensemble of the different BERT models trained on heterogeneous label clusters leads to our best final model. Empirically, on a Wiki dataset with around 0.5 million labels, X-BERT achieves new state-of-the-art results where the precision@1 reaches 67:80%, a substantial improvement over 32.58%/60.91% of deep learning baseline fastText and competing XMC approach Parabel, respectively. This amounts to a 11.31% relative improvement over Parabel, which is indeed significant since the recent approach SLICE only has 5.53% relative improvement.