attention注意力机制的理解及简单实现（keras实现版本）

57 阅读 0 评论 38 点赞

我是靠谱客的博主喜悦小熊猫，最近开发中收集的这篇文章主要介绍attention注意力机制的理解及简单实现（keras实现版本），觉得挺不错的，现在分享给大家，希望可以做个参考。

概述

attention 的本质：其实就是一个加权求和。

问题：k 个d 维的特征向量h i (i=1,2,...,k) 整合这k 个特征向量的信息，变成一个向量h ∗ (仍是d 维)

解决方法 求平均值 -------（mean pooling) 加权平均，即（α i 为权重）： h ∗ =∑ k i= α i h i

而attention所做的事情就是如何将α i 合理的算出来。

step 1: 设计一个打分函数f ，针对每个h i ，计算出一个score s i 。而s i 打分的依据，就是h i 与attention所关注的对象(其实就是一个向量)的相关程度，越相关，所得s i 值越大。

step 2：对所得到的k 个score s i (i=1,2,...,k) ,通过一个softmax函数，得到最后的权重α i ，即： α i =softmax(s i )

其代码简单实现版本：

from keras.layers.core import*
from keras.models import Sequential

input_dim = 32
hidden = 32

#The LSTM model - output_shape = (batch, step, hidden)
model1 = Sequential()
model1.add(LSTM(input_dim=input_dim, output_dim=hidden, input_length=step, return_sequences=True))

#The weight model - actual output shape = (batch, step)
# after reshape : output_shape = (batch, step, hidden)
model2 = Sequential()
model2.add(Dense(input_dim=input_dim, output_dim=step))
model2.add(Activation('softmax')) # Learn a probability distribution over each step.
#Reshape to match LSTM's output shape, so that we can do element-wise multiplication.
model2.add(RepeatVector(hidden))
model2.add(Permute(2, 1))

#The final model which gives the weighted sum:
model = Sequential()
model.add(Merge([model1, model2], 'mul')) # Multiply each element with corresponding weight a[i][j][k] * b[i][j]
model.add(TimeDistributedMerge('sum')) # Sum the weighted elements.

model.compile(loss='mse', optimizer='sgd')

或者以下三行：(自己还未来得及真实复现，只是用别人的代码）

inputs = Input(shape=(input_dims,))
attention_probs = Dense(input_dims, activation='softmax', name='attention_probs')(inputs)
attention_mul = merge([inputs, attention_probs], output_shape=32, name='attention_mul', mode='mul')