## Routing By Agreement

This is the third article in the series on a new type of neural network based on capsules called CapsNet. I`ve already talked about intuition, and what a capsule is and how it works. In this post, I`ll talk about the new dynamic routing algorithm that helps train capsule networks. Consider how this corrects the example if the features are not in the right places, which has created a problem for CNNs. The dynamic routing process helps solve this problem. Think about when the four features of a 3 are randomly placed in an entrance, as on the left side. A folding network would consider this as 3, since it has the four components of a 3. However, in a network of capsules, the "capsule 3" would receive information from the hidden layer that says that the parts of the 3 are not correct compared to a 3! So when the first capsule decides where to transmit its output, it does not see an agreement for either the 2 or the 3 and sends its emission evenly. As a result, the capsule network does not predict a 2 or a 3 with confidence.

I hope this shows why routing should be dynamic, as it depends on the poses of functions activated by the capsule network. Here is the most difficult part of dynamic routing: these routing coefficients are not parts learned from a network! The following three paragraphs explain why. Our weights learned in capsule networks are transformation matrixes - analogous to the weights of an NN. These transformation matrixes can understand how each function is related to the whole. This can teach us z.B. that the first feature is the top of a 3 or the top of a 2, that the second is the right center of a 3, etc. Basically, in routing, we assume that each capsule is activated in layer (l), because it is part of a "everything" in the next layer. In this step, we assume that there is a latent variable that explains how "all" of our information comes from and we try to deduce the probability that each matrix output comes from the higher position (l-1). A second layer of a capsule in the shape of numbers has a 16 dimensional capsule for each digit (0-9). Dynamic routing connects (only) primary and number layers. A 10 [32x6x6] x weight matrix controls the assignment between layers.

[1] Because the capsules are independent, the probability of correct detection is much higher if multiple capsules accept. A minimum group of two capsules that envisage a 6-dimensional unit would only agree once in one million studies within 10%. As the number of dimensions increases, the probability of randomly agreed to a larger cluster with larger dimensions decreases exponentially. [1] We recognize both and we look very much alike. If we repeat with other sketches, we get the same knowledge. Thus, the mouth capsule and the eye capsule can be strongly linked to a parental capsule about 200 pixels wide. From our experience, a face is 2 times () the width of a mouth and 3 times the width () of an eye. The parental capsule we discovered is therefore a facial capsule. Of course, we can make it more accurate by adding more properties like height or color. In dynamic routing, we transform the vectors of an input capsule with a transformation matrix to form a vote, and we group capsules with similar voices. These voices eventually become the exit vector of the upper capsule.

So how do we know? Just do it in the deep learning method: backpropagation with a cost function.