【Keras】TimeDistributed的理解和用法

74 阅读 0 评论 49 点赞

我是靠谱客的博主快乐薯片，最近开发中收集的这篇文章主要介绍【Keras】TimeDistributed的理解和用法，觉得挺不错的，现在分享给大家，希望可以做个参考。

概述

之前一直在看one-stage目标定位的算法，主要是速度快。今天无聊就看下mask-rcnn的源码，了解下主要结构和训练过程。看网络图中发现了使用TimeDistributed层，平常自己没有用过这样的层，所以看官方文档和其他人的博客，发现有的博客理解的并不太正确，所以还是简单介绍下吧。只是个人理解，如有问题欢迎指出。

TimeDistributed顾名思义就是使用时间序列来进行一系列张量操作。个人认为应该加上share这个单词，因为这个TimeDistributed都是共享权重信息的。下面进行例子验证：

应用于Dense层：

#coding:utf-8
from keras.models import Input,Model
from keras.layers import Dense,Conv2D,TimeDistributed
 
input_ = Input(shape=(12,8))
out = TimeDistributed(Dense(units=10))(input_)
#out = Dense(units=10)(input_)
model = Model(inputs=input_,outputs=out)
model.summary()

一共有90个参数，8×10个weights，10个bias，序列长度一共是12个。从参数数量来看，这12个序列共享这90个训练参数，整个网络输出大小为（None，12，10）。但是这里，使用out = Dense(units=10)(input_)来代替out = TimeDistributed(Dense(units=10))(input_)也是可以的，输出和参数量都是一样的，不知道原因。

应用于Conv2D层：

from keras.models import Input,Model
from keras.layers import Dense,Conv2D,TimeDistributed
 
input_ = Input(shape=(12,32,32,3))
out = TimeDistributed(Conv2D(filters=32,kernel_size=(3,3),padding='same'))(input_)
model = Model(inputs=input_,outputs=out)
model.summary()

结果为

Using TensorFlow backend.
_________________________________________________________________
Layer (type)
Output Shape
Param #
=================================================================
input_1 (InputLayer)
(None, 12, 32, 32, 3)
0
_________________________________________________________________
time_distributed_1 (TimeDist (None, 12, 32, 32, 32)
896
=================================================================
Total params: 896
Trainable params: 896
Non-trainable params: 0
_________________________________________________________________

这里12代表就是时间序列（一定注意不是batch，因为他使用的是shape而不是batch_shape），32，32，3指的是高，宽，通道数。卷积操作使用TimeDistributed就相当与这12个时间序列共享一个卷积层参数信息，无论时间序列值为多少，参数总量还是一定的。此处一共有896个参数，卷积核weights有3×3×3×32=864个，卷积核bias有32个。

TimeDistributed在mask-rcnn的用法在于：对FPN网络输出的多层卷积特征进行共享参数。因此，个人认为TimeDistributed的真正意义在于使不同层的特征图共享权重。