BLOG: Capsule networks help AI identify images more efficiently

How does a computer recognise a picture? Can a computer distinguish between kittens and puppies? And what is the technology behind image recognition?

Nowadays, convolutional neural networks (CNNs) — a combination of computer science, neuroscience and mathematics — are the most advanced and commonly used method for classifying and identifying images. CNNs have also become one of the most influential innovations in computer vision and artificial intelligence (AI). Simply speaking, CNNs enable computers to classify pictures by looking for low-level features (such as edges and curves) and then constructing more abstract concepts, through a series of convolutional levels.[i]

However, CNNs perform poorly when a computer is given an image of an object from a different orientation than they are familiar with. This problem could be resolved by including different variations of the same image during training, but this takes a lot of time. As Tom Simonite writes in WIRED, “To teach a computer to recognize a cat from many angles, for example, could require thousands of photos covering a variety of perspectives. Human children don’t need such explicit and extensive training to learn to recognize a household pet.”[ii] A large number of sample databases need to be provided to the computer, and this demand for large amounts of data has already restricted the usefulness of CNNs.

Geoffrey Hinton, a computer scientist noted for his work on artificial neural networks, recently released two research papers that introduce his new approach, known as CapsNet (i.e., capsule network). CapsNet is designed to make up for the weaknesses of CNNs. It does this by enabling a computer to make full use of the spatial relationship between features. For example, the relative position of facial features can be used as inputs to help with face recognition (i.e., two adjacent eyes; nose under eyes; and mouth under nose).[iii]

Hinton’s idea is to narrow the gap between the best AI systems and human children. He aims to achieve this by integrating more knowledge of the real world into computer-vision software. In the first paper, Hinton describes a capsule as “a group of neurons whose activity vector represents the instantiation parameters of a specific type of entity such as an object or object part.”[iv] A capsule is a nested set of neural layers[v] and CapsNet is the networked structure of these capsules. The activities of neurons in an active capsules represent the various properties of a specific entity in the image (e.g., position, size, and direction).[vi] Capsules use vectors as the input and output. In contrast to other vectors, the vector output of capsules represents two parts:

  • Its length represents the probability of existence of the entity (an object, a visual concept, or a part of them).
  • Its orientation (i.e., the length-independent part) represents some of the graphical properties of the entity (e.g., position, colour, direction and shape.)

Capsules are able to track different parts of an object and their relative positions. In the meantime, a new algorithm that enables dynamic routing between capsules allows them to communicate with each other and create similar graphs representing the same target object.[vii] Therefore, CapsNet can be used to recognise an object when the image is rotated, tilted, or viewed at any other orientation. In this aspect, CapsNet works better than a CNN. Another advantage of CapsNet is that it takes only a fraction of the data that CNNs require to achieve a state-of-the-art result. In this sense, it is much closer to the behaviour of human brain. If verified on a large scale, CapsNet may be useful in domains such as healthcare (where there is a lack of data with which to train AI systems).[viii]

To date, Hinton’s intuition is supported by evidence. It has been shown that CapsNet is as proficient at understanding handwriting as traditional neural networks. Furthermore, CapsNet has been shown to significantly reduce (by 50%) the error rate for identifying toy cars and trucks.[ix] CapsNet is therefore full of promise. However, the current implementation has scope for improvement. Furthermore, the approach is yet to be proven on a large collection of images and takes more time than the image-recognition software that is currently used.

“It’s too early to tell how far this particular architecture will go, but it’s great to see Hinton breaking out of the rut that the field has seemed fixated on,” said Gary Marcus, a professor of psychology at NYU. Hinton is optimistic about the future of CapsNet, and will continue his research together with his team. It is hoped that this network structure will become more advanced over time, and will eventually contribute to field of AI.


[i]Adit Deshpande. 20 July 2016. A Beginner’s Guide To Understanding Convolutional Neural Networks. [online]. [Accessed 01 December 2017].
[ii] Tom Simonite. 02 November 2017. Google’s AI wizard unveils a new twist on neural networks. [Accessed 01 December 2017].
[iii] Nick Bourdakos. 10 November 2017. Capsule Networks Are Shaking up AI — Here’s How to Use Them. [online]. Available from: [Accessed 01 December 2017].
[iv] Geoffrey Hinton. Sara Sabour. Nicholas Frosst. 26 October 2017. Dynamic Routing Between Capsules. [online]. Available from: [Accessed 01 December 2017].
[v] Debarko De. 01 November 2017. What is a CapsNet or Capsule Network? [online]. Available from: [Accessed 01 December 2017].
[vi] Geoffrey Hinton. Sara Sabour. Nicholas Frosst. 26 October 2017. Dynamic Routing Between Capsules. [online]. Available from: [Accessed 01 December 2017].
[vii] Max Pechyonki. 03 November 2017. Understanding Hinton’s Capsule Networks. Part I: Intuition. [online]. Available from:³-theory-practice-business/understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b. [Accessed 01 December 2017].
[viii] Tom Simonite. 02 November 2017. GOOGLE’S AI WIZARD UNVEILS A NEW TWIST ON NEURAL NETWORKS. [online]. Available from: [Accessed 01 December 2017].
[ix] Robby Berman. 03 November 2017. Buh-Bye, ‘Traditional’ Neural Networks. Hello, Capsules. [online]. Available from: [Accessed 01 December 2017].

About the Author