with tf.GradientTape() as tape 梯度带 Tensorflow自动求导API梯度带(Gradientape)的定义使用范围watch方法网络训练应用参考

100 阅读 0 评论 66 点赞

我是靠谱客的博主闪闪板栗，最近开发中收集的这篇文章主要介绍with tf.GradientTape() as tape 梯度带 Tensorflow自动求导API梯度带(Gradientape)的定义使用范围watch方法网络训练应用参考，觉得挺不错的，现在分享给大家，希望可以做个参考。

概述

梯度带(Gradientape)的定义

TensorFlow 为自动微分提供了 tf.GradientTape API ，根据某个函数的输入变量来计算它的导数。Tensorflow 会把 ‘tf.GradientTape’ 上下文中执行的所有操作都记录在一个磁带上 (“tape”)。然后基于这个磁带和每次操作产生的导数，用反向微分法（“reverse mode differentiation”）来计算这些被“记录在案”的函数的导数。

使用范围

tf.Variable 或 tf.compat.v1.get_variable （相对于tf.constant）并设置为Trainable的变量可进行自动求导。
或者使用watch方法手动进行管理的变量的求导。

watch方法

最简单的实现y=x*x的求导

x = tf.constant(3.0)
with tf.GradientTape() as g:
  g.watch(x)
  y = x * x
dy_dx = g.gradient(y, x) # Will compute to 6.0

y=x*x的二阶求导

x = tf.constant(3.0)
with tf.GradientTape() as g:
  g.watch(x)
  with tf.GradientTape() as gg:
    gg.watch(x)
    y = x * x
  dy_dx = gg.gradient(y, x)     # Will compute to 6.0
d2y_dx2 = g.gradient(dy_dx, x)  # Will compute to 2.0

多元方程求导 z=y**2=(x*x)**2

x = tf.constant(3.0)
with tf.GradientTape(persistent=True) as g:
  g.watch(x)
  y = x * x
  z = y * y
dz_dx = g.gradient(z, x)  # 108.0 (4*x^3 at x = 3)
dy_dx = g.gradient(y, x)  # 6.0

网络训练应用

在深度神经网络训练过程中最常用的误差反向传播算法(Error Back Propagation Training)是更新网络权重的关键，以此举例：
构建网络和优化器，即创建好一个网络静态图

model = keras.Sequential([ 
    layers.Dense(256, activation='relu'),
    layers.Dense(10)])
optimizer = optimizers.SGD(learning_rate=0.001)

基于loss计算梯度，并进行方向误差传播，更新网络权重。

def train_epoch(epoch):
    for step, (x, y) in enumerate(train_dataset):
        with tf.GradientTape() as tape: #梯度带中的变量为trainable_variables，可自动进行求导
            x = tf.reshape(x, (-1, 28*28))
            out = model(x)
            loss = tf.reduce_sum(tf.square(out - y)) / x.shape[0]
        grads = tape.gradient(loss, model.trainable_variables)
        optimizer.apply_gradients(zip(grads, model.trainable_variables))