Will artificial intelligence teach machines to learn like humans?

A team of scientists have created a new artificial intelligence model that mimics human learning capabilities and applies them to a large number of handwritten characters. The algorithm enables a machine to distinguish between and draw simple visual shapes, such as letters from different alphabets, from a small number of examples.

The model, known as Bayesian program learning (or BPL), was introduced in a study[1] published in Science Magazine in December 2015. The study was co-authored by Brenden Lake, a data science fellow at NYU, Ruslan Salakhutdinov, an assistant professor of Computer Science and Statistics at the University of Toronto, and Joshua Tennebaum, a professor in the department of Brain and Cognitive Sciences at MIT.

The model has the potential to shorten the time a machine requires to ‘learn’ things (e.g., new languages, images, and symbols). This could prove useful for fields such as national security imagery analytics. In fact, the research received funding from defense agencies including the Air Force Office of Scientific Research and the Intelligence Advanced Research Agency.

The study[1] highlights the difference between human and machine learning abilities. While humans require a small number of examples for concept learning, the most advanced machines use vast amounts of data to perform an identical activity. Most recent progress in the field of AI comes from deep-learning networks, which are loosely based on the human nervous system. They are made up of several stacked neural networks, which are themselves composed of several layers. A network with more than three layers is referred to as ‘deep’ (hence deep learning).

The layers are made up of nodes, which are analogous of neurons in the human brain. This is the reason they are called neural networks. The functioning of each node layer is based on the output of the previous layer. For example, if a machine is trained to recognise a symbol from the alphabet, a neuron in the bottom layer is going to look at a small part of the symbol and extract useful features from it. The information will then be passed to the next neuron. Interestingly, such first-node features are general and not specific for a particular task – in this case, recognising a letter. Features become specific only when they reach the last layer. Therefore, the best results are achieved when networks contain a large number of layers that require a lot of data to go through them.

BPL enables machines to distinguish and draw simple shapes given a set of examples. Concept learning—first introduced in the work of a cognitive psychologist Jerome Bruner[2]—refers to a learning task in which a human or machine learner is shown a set of example objects and taught to classify them. The learner then simplifies what has been studied and applies it to future examples. The more difficult the concept is, the harder it is to simplify and therefore learn.

A simple case of human concept learning is a child learning the letters of the alphabet. Only several examples are required for a child to distinguish letter A from letter B, regardless of the size and colour of the letter, and produce new examples of those letters.

The model translates concepts into easy-to-understand computer programs for machines. Any concept (in this case, any letter) is represented by a program, which generates examples of that letter compositionally (by parts) every time the program is run. These programs produce different outputs for every execution—unlike standard computer programs—which is analogous to the way in which people will draw the same letter differently. Researchers also claim that BPL “learns to learn”, or constructs new programs, by reusing the pieces of existing ones (e.g., using its knowledge of Sanskrit to produce new letters in Tibetan).

One of the current leading approaches for object recognition reported in the paper is a deep convolutional network, or ConvNet, based on Alex Krizhevsky’s innovative research.[3] ConvNet is a neural network trained to classify 1.2 million high-resolution images in the ImageNet, which is a dataset of over 15 million labeled images belonging to 22,000 categories. Researchers achieved a highest probability error rate of 37.5% on the test data, which is considerably better than any previous work. However, the network is not speed-optimised and, again, requires huge datasets for its training.

The researchers claim that BPL outperforms recent deep learning approaches.[1] Its performance was assessed based on five concept-learning tasks, alongside people and alternative models. The model passed “visual Turing tests”, in which human judges tried to identify whether characters were drawn by humans or machines after seeing paired examples. The most basic task was to evaluate the drawings from nine humans with nine new examples drawn by BPL. On average, only 52% of judges were able to identify computer-drawn characters, which is barely better than the 50-50% chance inherent to such a comparison. However, it is questionable whether this test is an adequate measure of BPL’s success. Xiaowei Zhao, an assistant professor of Psychology at Emmanuel College, highlighted the main difference between the visual Turing test and the original Turing test. In the latter, “the judge should be an intelligent agent evaluating a task that he or she has expertise with. However, in the ‘Visual Turing tests’ presented in the paper, human judges were “naive”, as the characters were newly presented to them”.[4]

Screen Shot 2016-04-22 at 10.31.02
Figure 1: Examples of Omniglot characters produced by humans and machines (i.e., Bayesian program learning, BPL) that were assessed via a visual Turing test.

Researchers evaluated the performance of BPL and other computational approaches on a simple task (see Figure 1) after seeing just one example of a new concept. This is known as one-shot learning. After being shown a character, the machine and human participants had to select another example of that same character from a set of 20 distinct ones. Those characters were taken from Omniglot, a dataset of 1,623 handwritten characters from 50 writing systems. The BPL model “scored an error rate of 3.3% compared to 4.5% for humans, and performed at least twice as better than other deep learning models”, (i.e., varieties of deep ConvNets from Krizhevsky’s research and various versions of BPL).[1] These results can be seen in the graph in Figure 2.


BPL results
Figure 2: Results of an experiment comparing one-shot classification abilities of a variety of machine-learning models and the associated error rates.

Researchers also tested the ability of BPL to generate new concepts. In the first task, humans, BPL and other computational approaches created a new character that appears stylistically consistent with a foreign alphabet. BPL showed a good result, with only 51% of judges recognising the machine-created symbols compared to random chance. Yet again, it was judged using the visual Turing test.

BPL uses human assumptions about character strokes and the relations between them as a base for interpreting and classifying data. It seems that comparing BPL to deep networks, which don’t have any preset assumptions about what a stroke is, is not a fair test. Baxter Eaves, a Postdoctoral Associate at Rutgers University, agreed that “It doesn’t make sense to compare BPL with neural networks. I would expect runtime comparisons and experiments exploring the generalisability of BPL. But BPL is not a general algorithm. It requires the implementer to describe what the parts of the domain are built of (e.g., that characters are built of strokes that are expected to behave in a certain way), which is most often very difficult to do.”[5]

BPL did achieve its goal of learning from just a few examples – but only in a specific task. “An ideal algorithm requires very little data to learn, and yet still learns from scratch, much as we imagine humans do. Currently, deep neural networks are capable of recognising similarities, but we hope that one day they will do so more efficiently, without having to learn from thousands of examples. The version of BPL introduced in the paper achieved the goal of efficient handwriting analysis from little number of examples. Researchers transferred their own understanding of handwriting to the BPL algorithm; however this makes the algorithm less general. It is a trade-off between efficiency and generality, transferring data from humans to machines directly, or by exposing the algorithm to more data”, explained Chris Nicholson, CEO and co-founder of Deeplearning4j.org, the first open-source deep-learning framework, which helps to find solutions for business problems through the application of deep learning.[6]

Even though capturing human level learning is a long-term goal, BPL’s ability to learn from fewer examples could lead to a progressive execution of more complex tasks. The algorithm has potential applications in defense image recognition. Drone operators nowadays spend days analysing specific targets. The algorithm has the potential to reduce strain on operators and perhaps improve the accuracy of targeted strikes. However, there is a massive gap between classifying letters and predicting human actions via drone imagery. The researchers themselves mention that the algorithm “lacks explicit knowledge of parallel lines, symmetry, optional elements such as cross bars in ‘7’s, and connections between the ends of strokes and other strokes”.[1]

More immediate applications include speech recognition. While BPL may not initially be suitable for it, constructing programs for spoken words may be possible by composing sounds systematically to form syllables, just as with letters and strokes.

Each research paper is a step towards the goal of closing the gap between human and artificial intelligence, and this paper is no exception. The BPL model has achieved its goal of using a small number of examples to learn, which is promising. Yet it is far from capturing human cognitive abilities and teaching machines to learn like humans.


  1. B. M. Lake, R. Salakhutdinov, J. B. Tenenbaum, Human-level concept learning through probabilistic program induction, Science 350, p. 1332–1338, 2015.
  2. J. S. Bruner, The Process of Education, 1960.
  3. A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks.
  4. Lake, Brenden M and Salakhutdinov, Ruslan and Tenenbaum, Joshua B, Human-level concept learning through probabilistic program induction, 2016.
  5. B. Eaves, A silly comparison in “Bayesian program learning”, 2015.
  6. Nicholson, Chris and Gibson, Adam, Deeplearning4j: Open-source, distributed deep learning for the JVM, 2016.