我是靠谱客的博主 平常酒窝,最近开发中收集的这篇文章主要介绍胶囊路由_评论:胶囊之间的动态路由,觉得挺不错的,现在分享给大家,希望可以做个参考。



Link to paper: https://arxiv.org/pdf/1710.09829.pdf

链接到论文: https : //arxiv.org/pdf/1710.09829.pdf

The paper introduced an implementation of Capsule Networks which use an iterative routing-by-agreement mechanism: A lower-level capsule prefers to send its output to higher level capsules whose activity vectors have a big scalar product with the prediction coming from the lower-level capsule.




Human visual system uses a sequence of fixation points to ensure that a tiny fraction of optic array is processed at highest resolution. For a single fixation, a parse tree is carved out of a fixed small groups of neurons called “capsules” and each node in the parse tree will correspond to an active capsule. By using an iterative process, each capsule will choose a higher-level capsule to be its parent. This process will solve the problem of assigning parts to wholes.

人类视觉系统使用一系列固定点来确保以高分辨率对光学阵列的一小部分进行处理。 对于单个注视,解析树是从固定的一小组称为“胶囊”的神经元中雕刻出来的,解析树中的每个节点都将对应一个活动的胶囊。 通过使用迭代过程,每个胶囊将选择一个更高级别的胶囊作为其父对象。 此过程将解决将零件分配给整体的问题。

For activity vector of each active capsule:


  • Its length is the probability that an entity exists in the image.

  • Its orientation is object’s estimated pose parameters like pose (position, size, orientation), deformation, velocity, albedo, hue, texture, etc.




Since the output of a capsule is a vector, it is possible to use a powerful dynamic routing mechanism to ensure the output is sent to an appropriate parent. For each possible parent, the capsule computes a “prediction vector” by multiplying its own output by a weight matrix. If this prediction vector has a large scalar product with the output of a possible parent, a coupling coefficient for that parent will be increased and for other parents will be decreased, thus increases the contribution a capsule makes to that parent, increasing the scalar product of the capsule’s prediction with the parent’s output. This is much more effective when compared to max-pooling, which allows neurons in one layer to care only about the most active feature detector in the previous layer. Also, unlike max-pooling, capsules don’t throw away information about the precise location of the entity or its pose.

由于封装的输出是矢量,因此可以使用强大的动态路由机制来确保将输出发送到适当的父级。 对于每个可能的父对象,胶囊通过将其自身的输出乘以权重矩阵来计算“预测向量”。 如果此预测向量与可能的父对象的输出具有较大的标量积,则该父对象的耦合系数将增加,而其他父对象的耦合系数将减小,因此将增加胶囊对该父对象的贡献,从而增加的标量积。胶囊的预测以及父母的输出。 与最大池化相比,此方法要有效得多,最大池化可使一层中的神经元只关心上一层中最活跃的特征检测器。 而且,与最大池化不同,胶囊不会丢弃有关实体或其姿势的精确位置的信息。

Calculating vector inputs and outputs of a capsule:


Because the length of the activity vector represents the probability that an entity exists in the image, it has to be between 0 and 1. Squash function will ensure that short vectors’ length will get shrunk to almost 0 and long vectors’ one will get shrunk to slightly below 1.


Except the first layer of capsules, the total input to a capsule is a weighted sum over all prediction vectors from the capsules in the previous layer.


Image for post
Total input to a capsule. c_ij are coupling coefficient.
胶囊的总输入量。 c_ij是耦合系数。

These prediction vectors are produced by multiplying the output of a capsule in the layer below by a weight matrix.


Image for post
Prediction vectors.

The coupling coefficients c_ij are determined by the iterative dynamic routing process. Between a capsule and all the capsules in the layer above, they are sum to 1 and are determined by a softmax function whose initial logits b_ij are the log prior probabilities that capsule i should be coupled to capsule j.

耦合系数c_ij由迭代动态路由过程确定。 在一个胶囊和上一层中的所有胶囊之间,它们之和为1,并由softmax函数确定,该函数的初始对数b_ij是应将胶囊i耦合到胶囊j的对数先验概率。

Image for post
Coupling coefficients.

The initial logit b_ij are later updated by adding scalar product:

初始logit b_ij随后通过添加标量乘积进行更新:

Image for post
Image for post
Routing algorithm for CapsNet

Margin loss for digit existence:


The top-level capsule for an object class should have a long instantiation vector if that object is present in the image. To allow multiple class, the authors use a separate margin loss for each capsule:

如果图像中存在该对象,则该对象类的顶级胶囊应具有长的实例化向量。 为了允许多个类别,作者对每个胶囊使用单独的保证金损失:

Image for post
Margin loss for each capsule k. T_k = 1 if object of class k is present. m+ = 0.9, m- = 0.1, λ = 0.5.
每个胶囊k的保证金损失。 如果存在类别k的对象,则T_k = 1。 m + = 0.9,m- = 0.1,λ= 0.5。

This ensures that if an object of class k present, the loss should be no less than 0.9 and if it doesn’t, the loss should be no more than 0.1.


The total loss is the sum of the losses of all object capsules.


CapsNet architecture for MNIST


Image for post
A simple CapsNet architecture consist of 2 convolutional layers and one fully connected layer.

CapsNet achieved state-of-the-art performance on MNIST after just a few training epoch. After training for about 6-7 epoch with this implementation, CapsNet was able to achieve about 99% accuracy on test set. The rest were negligible improvement.

只需几次培训,CapsNet就在MNIST上取得了最先进的性能。 在使用此实现训练了大约6-7个纪元之后,CapsNet能够在测试集上实现大约99%的准确性。 其余的改善可忽略不计。

Regularization by reconstruction


Image for post
Decoder structure to reconstruct digit from DigitCaps layer.

The authors used reconstruction loss to encourage the digit capsules to encode the instantiation parameters of the input digit. It learns to reconstruct the image by minimizing the squared difference between the reconstructed image and the input image. The loss will be the sum of margin loss (||L2||) and reconstruction loss. However, to prevent the domination of reconstruction loss, it was scaled down to 0.0005.

作者使用重建损失来鼓励手指囊对输入手指的实例化参数进行编码。 它通过最小化重建图像和输入图像之间的平方差来学习重建图像。 该损失将是余量损失(|| L2 ||)与重建损失之和。 但是,为了防止控制重建损失,将其缩小到0.0005。

Image for post
An example of image reconstruction.



When dealing with dataset that the backgrounds are much too varied (like CIFAR-10), CapsNet performs poorly compared to other state-of-the-art architectures.


Youtube: https://www.youtube.com/watch?v=pPN8d0E3900

YouTube: https : //www.youtube.com/watch?v = pPN8d0E3900

翻译自: https://medium.com/xulab/review-dynamic-routing-between-capsules-ea9c2fb64765






评论列表共有 0 条评论
