Artificial Intelligence, AI in 2018 and beyond
These are my opinions on where deep neural network and machine learning is headed in the larger field of artificial intelligence, and how we can get more and more sophisticated machines that can help us in our daily routines.
Please note that these are not predictions of forecasts, but more a detailed analysis of the trajectory of the fields, the trends and the technical needs we have to achieve useful artificial intelligence.
Not all machine learning is targeting artificial intelligences, and there are low-hanging fruits, which we will examine here also.
Goals
The goal of the field is to achieve human and super-human abilities in machines that can help us in every-day lives. Autonomous vehicles, smart homes, artificial assistants, security cameras are a first target. Home cooking and cleaning robots are a second target, together with surveillance drones and robots. Another one is assistants on mobile devices or always-on assistants. Another is full-time companion assistants that can hear and see what we experience in our life. One ultimate goal is a fully autonomous synthetic entity that can behave at or beyond human level performance in everyday tasks.
See more about these goals here, and here, and here.
Software
Software is defined here as neural networks architectures trained with an optimization algorithm to solve a specific task.
Today neural networks are the de-facto tool for learning to solve tasks that involve learning supervised to categorize from a large dataset.
But this is not artificial intelligence, which requires acting in the real world often learning without supervision and from experiences never seen before, often combining previous knowledge in disparate circumstances to solve the current challenge.
How do we get from the current neural networks to AI?
Neural network architectures — when the field boomed, a few years back, we often said it had the advantage to learn the parameters of an algorithms automatically from data, and as such was superior to hand-crafted features. But we conveniently forgot to mention one little detail… the neural network architecture that is at the foundation of training to solve a specific task is not learned from data! In fact it is still designed by hand. Hand-crafted from experience, and it is currently one of the major limitations of the field. There is research in this direction: here and here (for example), but much more is needed. Neural network architectures are the fundamental core of learning algorithms. Even if our learning algorithms are capable of mastering a new task, if the neural network is not correct, they will not be able to. The problem on learning neural network architecture from data is that it currently takes too long to experiment with multiple architectures on a large dataset. One has to try training multiple architectures from scratch and see which one works best. Well this is exactly the time-consuming trial-and-error procedure we are using today! We ought to overcome this limitation and put more brain-power on this very important issue.
Unsupervised learning —we cannot always be there for our neural networks, guiding them at every stop of their lives and every experience. We cannot afford to correct them at every instance, and provide feedback on their performance. We have our lives to live! But that is exactly what we do today with supervised neural networks: we offer help at every instance to make them perform correctly. Instead humans learn from just a handful of examples, and can self-correct and learn more complex data in a continuous fashion. We have talked about unsupervised learning extensively here.
Predictive neural networks — A major limitation of current neural networks is that they do not possess one of the most important features of human brains: their predictive power. One major theory about how the human brain work is by constantly making predictions: predictive coding. If you think about it, we experience it every day. As you lift an object that you thought was light but turned out heavy. It surprises you, because as you approached to pick it up, you have predicted how it was going to affect you and your body, or your environment in overall.
Prediction allows not only to understand the world, but also to know when we do not, and when we should learn. In fact we save information about things we do not know and surprise us, so next time they will not! And cognitive abilities are clearly linked to our attention mechanism in the brain: our innate ability to forego of 99.9% of our sensory inputs, only to focus on the very important data for our survival — where is the threat and where do we run to to avoid it. Or, in the modern world, where is my cell-phone as we walk out the door in a rush.
Building predictive neural networks is at the core of interacting with the real world, and acting in a complex environment. As such this is the core network for any work in reinforcement learning. See more below.
We have talked extensively about the topic of predictive neural networks, and were one of the pioneering groups to study them and create them. For more details on predictive neural networks, see here, and here, and here.
Limitations of current neural networks — We have talked about before on the limitation of neural networks as they are today. Cannot predict, reason on content, and have temporal instabilities — we need a new kind of neural networks that you can about read here.
Neural Network Capsules are one approach to solve the limitation of current neural networks. We reviewed them here. We argue here that Capsules have to be extended with a few additional features:
operation on video frames: this is easy, as all we need to do is to make capsules routing look at multiple data-points in the recent past. This is equivalent to an associative memory on the most recent important data points. Notice these are not the most recent representations of recent frames, but rather they are the top most recent different representations. Different representations with different content can be obtained for example by saving only representations that differ more than a pre-defined value. This important detail allows to save relevant information on the most recent history only, and not a useless series of correlated data-points.
predictive neural network abilities: this is already part of the dynamic routing, which forces layers to predict the next layer representations. This is a very powerful self-learning technique that in our opinion beats all other kinds of unsupervised representation learning we have developed so far as a community. Capsules need now to be able to predict long-term spatiotemporal relationships, and this is not currently implemented.
Continuous learning — this is important because neural networks need to continue to learn new data-points continuously for their life. Current neural networks are not able to learn new data without being re-trained from scratch at every instance. Neural networks need to be able to self-assess the need of new training and the fact that they do know something. This is also needed to perform in real-life and for reinforcement learning tasks, where we want to teach machines to do new tasks without forgetting older ones.
For more detail, see this excellent blog post by Vincenzo Lomonaco.
Transfer learning — or how do we have these algorithms learn on their own by watching videos, just like we do when we want to learn how to cook something new? That is an ability that requires all the components we listed above, and also is important for reinforcement learning. Now you can really train your machine to do what you want by just giving an example, the same way we humans do every!
Reinforcement learning — this is the holy grail of deep neural network research: teach machines how to learn to act in an environment, the real world! This requires self-learning, continuous learning, predictive power, and a lot more we do not know. There is much work in the field of reinforcement learning, but to the author it is really only scratching the surface of the problem, still millions of miles away from it. We already talked about this here.
Reinforcement learning is often referred as the “cherry on the cake”, meaning that it is just minor training on top of a plastic synthetic brain. But how can we get a “generic” brain that then solve all problems easily? It is a chicken-in-the-egg problem! Today to solve reinforcement learning problems, one by one, we use standard neural networks:
a deep neural network that takes large data inputs, like video or audio and compress it into representations
a sequence-learning neural network, such as RNN, to learn tasks
Both these components are obvious solutions to the problem, and currently are clearly wrong, but that is what everyone uses because they are some of the available building blocks. As such results are unimpressive: yes we can learn to play video-games from scratch, and master fully-observable games like chess and go, but I do not need to tell you that is nothing compared to solving problems in a complex world. Imagine an AI that can play Horizon Zero Dawn better than humans… I want to see that!
But this is what we want. Machine that can operate like us.
Our proposal for reinforcement learning work is detailed here. It uses a predictive neural network that can operate continuously and an associative memory to store recent experiences.
No more recurrent neural networks — recurrent neural network (RNN) have their days counted. RNN are particularly bad at parallelizing for training and also slow even on special custom machines, due to their very high memory bandwidth usage — as such they are memory-bandwidth-bound, rather than computation-bound, see here for more details. Attention based neural network are more efficient and faster to train and deploy, and they suffer much less from scalability in training and deployment. Attention in neural network has the potential to really revolutionize a lot of architectures, yet it has not been as recognized as it should. The combination of associative memories and attention is at the heart of the next wave of neural network advancements.
Attention has already showed to be able to learn sequences as well as RNNs and at up to 100x less computation! Who can ignore that?
We recognize that attention based neural network are going to slowly supplant speech recognition based on RNN, and also find their ways in reinforcement learning architecture and AI in general.
Localization of information in categorization neural networks — We have talked about how we can localize and detect key-points in images and video extensively here. This is practically a solved problem, that will be embedded in future neural network architectures.
Best,