我是靠谱客的博主 疯狂花生,这篇文章主要介绍Tensorflow 2 调试方法1. 调试 Tensor 值2. 调试设备位置3. 调试图结构4. 单步调试5. 调试高级 API (tf.keras)6. 数值问题 (NaN / Infinity)7. Tensorflow Debugger (tfdbg),现在分享给大家,希望可以做个参考。

文章目录

  • 1. 调试 Tensor 值
  • 2. 调试设备位置
  • 3. 调试图结构
    • a. tf.function 图
    • b. 运行图 (runtime graphs)
  • 4. 单步调试
  • 5. 调试高级 API (tf.keras)
  • 6. 数值问题 (NaN / Infinity)
  • 7. Tensorflow Debugger (tfdbg)

1. 调试 Tensor 值

打印Tensor的值

复制代码
1
2
3
4
5
6
7
8
9
10
import tensorflow as tf import numpy as np def log1p(x): y = 1.0 * x print(y) return tf.math.log(y) y = log1p(tf.constant([1., 2., 3.])) y = log1p(tf.constant([2., 3., 4.]) * np.pi)

运行结果

复制代码
1
2
3
tf.Tensor([1. 2. 3.], shape=(3,), dtype=float32) tf.Tensor([ 6.2831855 9.424778 12.566371 ], shape=(3,), dtype=float32)

解释

  • 函数log1p没有@tf.function被修饰,所以是立即执行的。

  • print函数能够输出张量的值

    • 类似numpy.ndarray
    • 可能设置设备到主机的复制

打印Tensor的聚合值

复制代码
1
2
3
4
5
6
7
8
def log1p(x): y = 1.0 * x print(tf.reduce_mean(y), tf.reduce_max(y), tf.reduce_min(y)) return tf.math.log(y) y = log1p(tf.constant([1., 2., 3.])) y = log1p(tf.constant([2., 3., 4.]) * np.pi)

运行结果

复制代码
1
2
3
tf.Tensor(2.0, shape=(), dtype=float32) tf.Tensor(3.0, shape=(), dtype=float32) tf.Tensor(1.0, shape=(), dtype=float32) tf.Tensor(9.424778, shape=(), dtype=float32) tf.Tensor(12.566371, shape=(), dtype=float32) tf.Tensor(6.2831855, shape=(), dtype=float32)
  • 可以用内建的TF函数打印变换后的tensor值

修改打印值的格式

复制代码
1
2
3
4
5
6
7
8
9
10
np.set_printoptions(precision=3) def log1p(x): y = 1.0 * x print(y) return tf.math.log(y) y = log1p(tf.constant([1., 2., 3.])) y = log1p(tf.constant([2., 3., 4.]) * np.pi)

输出

复制代码
1
2
3
tf.Tensor([1. 2. 3.], shape=(3,), dtype=float32) tf.Tensor([ 6.283 9.425 12.566], shape=(3,), dtype=float32)
  • EagerTensor.__str()____repr()__与numpy字符串格式挂钩
  • 因此可以使用numpy.set_printoptions()控制打印格式

输出图内的张量

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
@tf.function def collatz(n): counter = tf.constant(0) while n > 1: print(n) if n % 2 == 0: n //= 2 else: n = n * 3 + 1 counter += 1 return counter print(collatz(tf.constant(42)))

运行结果

复制代码
1
2
3
Tensor("placeholder:0", shape=(), dtype=int32) tf.Tensor(8, shape=(), dtype=int32)
  • Placeholder是TF while循环中graphlet的一部分

print(n)换成tf.print(n),结果变成

复制代码
1
2
3
4
5
6
7
8
9
10
42 21 64 32 16 8 4 2 tf.Tensor(8, shape=(), dtype=int32)
  • tf.print()打印出了真正的张量n执行中的值

不等长张量 RaggedTensor

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
ragged = tf.RaggedTensor.from_row_splits( values=[3.0, 1.0, 4.0, 1.0, 5.0, 9.0, 2.0, 6.0], row_splits=[0, 4, 4, 7, 8, 8] ) @tf.function def ragged_times_length_plus_one(x): row_lenghts = tf.reduce_sum(x.row_lengths()) y = x * tf.cast(row_lenghts, tf.float32) tf.print(y) return y + 1.0 ragged_times_length_plus_one(ragged)

输出

复制代码
1
2
tf.RaggedTensor(values=Tensor("Mul_1:0", shape=(8,), dtype=float32), row_splits=Tensor("x_1:0", shape=(6,), dtype=int64))
  • 不等长张量不能正常打印

稀疏张量

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
sparse = tf.sparse.SparseTensor( indices=[[0, 0], [1, 2]], values=[1.1, 2.2], dense_shape=[3, 4] ) @tf.function def sparse_times_non_zero_count(x): count = tf.cast(tf.math.count_nonzero(x.values), tf.float32) y = x * count tf.print(y) return y sparse_times_non_zero_count(sparse)

输出

复制代码
1
2
3
'SparseTensor(indices=[[0 0] [1 2]], values=[2.2 4.4], shape=[3 4])'
  • 稀疏张量可以打印

以编程方式访问图内的张量值

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
random_normal = tf.random_normal_initializer() w = tf.Variable(random_normal([2, 3])) b = tf.Variable(random_normal([3])) @tf.function def my_dense_layer(x): y = tf.matmul(x, w) y_with_bias = y + b return tf.nn.relu(y_with_bias), y, y_with_bias x = random_normal([4, 2]) print(my_dense_layer(x))

运行结果

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
(<tf.Tensor: id=460, shape=(4, 3), dtype=float32, numpy= array([[0. , 0.026, 0. ], [0. , 0.024, 0. ], [0. , 0.029, 0. ], [0. , 0.022, 0. ]], dtype=float32)>, <tf.Tensor: id=461, shape=(4, 3), dtype=float32, numpy= array([[-0. , 0.001, 0.001], [ 0.003, -0.001, -0.006], [-0.001, 0.003, 0.006], [ 0.002, -0.004, -0.008]], dtype=float32)>, <tf.Tensor: id=462, shape=(4, 3), dtype=float32, numpy= array([[-0.092, 0.026, -0.011], [-0.088, 0.024, -0.019], [-0.093, 0.029, -0.007], [-0.09 , 0.022, -0.021]], dtype=float32)>)
  • 对于控制流之外的中间张量值,可以将他们添加到返回值中,获得运行时的值

以编程访问图内的张量值 - while循环

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
@tf.function def collatz(n): counter = tf.constant(0) n_history = tf.TensorArray(n.dtype, size=0, dynamic_size=True) while n > 1: if n % 2 == 0: n //= 2 else: n = n * 3 + 1 n_history = n_history.write(counter, n) counter += 1 return counter, n_history.stack() print(collatz(tf.constant(42)))

运行结果

复制代码
1
2
(<tf.Tensor: id=556, shape=(), dtype=int32, numpy=8>, <tf.Tensor: id=557, shape=(8,), dtype=int32, numpy=array([21, 64, 32, 16, 8, 4, 2, 1])>)
  • 可以用tf.TensorArray实现

2. 调试设备位置

op(算子)在设备上的位置

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
import tensorflow as tf import numpy as np # 必须在程序开始时执行 tf.debugging.set_log_device_placement(True) def log1p(x): y = 1.0 + x tf.print(y) return tf.math.log(y) log1p(tf.constant([1.0, 2.0, 3.0]) * np.pi)

运行结果

复制代码
1
2
3
4
5
6
7
Executing op Mul in device /job:localhost/replica:0/task:0/device:CPU:0 Executing op AddV2 in device /job:localhost/replica:0/task:0/device:CPU:0 Executing op StringFormat in device /job:localhost/replica:0/task:0/device:CPU:0 Executing op PrintV2 in device /job:localhost/replica:0/task:0/device:CPU:0 [4.14159298 7.28318548 10.424778] Executing op Log in device /job:localhost/replica:0/task:0/device:CPU:0
  • 每当单个算子放到设备上时,输出算子的位置
  • 不输出相同设备上的算子的重复eager执行

tf.funtion在设备上的位置

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
import tensorflow as tf import numpy as np # 必须在程序开始时执行 tf.debugging.set_log_device_placement(True) @tf.function def log1p(x): y = 1.0 + x tf.print(y) return tf.math.log(y) log1p(tf.constant([1.0, 2.0, 3.0]) * np.pi)

Jupyter Notebook 的运行结果

复制代码
1
2
3
Executing op __inference_log1p_19 in device /job:localhost/replica:0/task:0/device:CPU:0 [4.14159298 7.28318548 10.424778]

命令行运行的结果

复制代码
1
2
3
4
5
6
7
8
9
10
x: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0 add: (AddV2): /job:localhost/replica:0/task:0/device:CPU:0 StringFormat: (StringFormat): /job:localhost/replica:0/task:0/device:CPU:0 PrintV2: (PrintV2): /job:localhost/replica:0/task:0/device:CPU:0 Log: (Log): /job:localhost/replica:0/task:0/device:CPU:0 Identity: (Identity): /job:localhost/replica:0/task:0/device:CPU:0 identity_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:CPU:0 add/x: (Const): /job:localhost/replica:0/task:0/device:CPU:0 [4.14159298 7.28318548 10.424778]
  • set_log_device_placement()在Jupyter中不会显示图内算子的位置
  • 因为Jupyter只显示stdout的结果,变量输出输出在了info log中
  • set_log_device_placement()只打印以下项目的设备位置
    • Eager 算子执行
    • 图构建
  • 对于后者,不保证所有的算子在运行时执行。Grapper优化可能在实际运行前将其裁剪
  • set_log_device_placement()无法在TPU上良好地工作

3. 调试图结构

a. tf.function 图

获得tf函数的图

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
random_normal = tf.random_normal_initializer() w = tf.Variable(random_normal([2, 3])) b = tf.Variable(random_normal([3])) @tf.function def my_dense_layer(x): y = tf.matmul(x, w) y_with_bias = y + b return tf.nn.relu(y_with_bias), y, y_with_bias x = random_normal([4, 2]) print(my_dense_layer(x)) graph = my_dense_layer.get_concrete_function(x).graph graph.as_graph_def()

运行结果

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
node { name: "x" op: "Placeholder" attr { key: "_user_specified_name" value { s: "x" } } attr { key: "dtype" value { type: DT_FLOAT } } attr { key: "shape" value { shape { dim { size: 4 } dim { size: 2 } } } } } node { name: "MatMul/ReadVariableOp/resource" op: "Placeholder" device: "/job:localhost/replica:0/task:0/device:CPU:0" attr { key: "dtype" value { type: DT_RESOURCE } } ...
  • 在第一个调用或穿过tf.function时使用get_concrete_function
  • concrete函数是基于特定的输入参数,将Python函数编译成图的结果

TensorBoard 图可视化工具

  • 信息流的垂直方向:自底向上
  • 按名字范围分组:是
  • 能够在GraphDef中处理FunctionDefLibrary(例如 V2控制流):是(使用breakout工具箱)

获得和绘制函数图: Colab (仅Google3)

复制代码
1
2
3
$ blaze run -c opt --config=python3 --config=cuda learning/brain/python/client/colab:colab_notebook_with_tfgraph_py3
复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
random_normal = tf.random_normal_initializer() w = tf.Variable(random_normal([2, 3])) b = tf.Variable(random_normal([3])) @tf.function def my_dense_layer(x): y = tf.matmul(x, w) y_with_bias = y + b return tf.nn.relu(y_with_bias), y, y_with_bias x = random_normal([4, 2]) print(my_dense_layer(x)) from google3.learning.brain.python.client import colab graph = my_dense_layer.get_concrete_function(x).graph colab.tfgraph.display(graph)

获得和绘制函数图: TF2中的控制流

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
@tf.function def collatz(n): counter = tf.constant(0) while n > 1: if n % 2 == 0: n //= 2 else: n = n * 3 + 1 counter += 1 return counter print(collatz(tf.constant(42))) collatz_graph = collatz.get_concrete_function(tf.constant(42)).graph colab.tfgraph.display(collatz_graph)
  • 控制流V2被转换成了graphlet
  • TensorBoard图可视化用break out boxes展示graphlet
  • Netron也不能处理这样的nested graph structure

分布式策略

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
gpus = tf.config.list_physical_devices("GPU") if len(gpus) == 1: tf.config.experimental.set_virtual_device_configuration( gpus[0], # Which physical device to use [tf.config.LogicalDeviceConfiguration(512) for _ in range(4)] # Resultant logical devices ) tf.config.list_logical_devices() dist_strat = tf.distribute.MirroredStrategy() with dist_strat.scope(): w = tf.Variable(tf.ones([4, 10])) def f(): with tf.GradientTape() as tape: loss = tf.math.square(w) grads = tape.gradient(loss, w) return grads dist_f = lambda: dist_strat.experimental_run_v2(f) dist_f = tf.function(dist_f, autograph=True) g = dist_f.get_concrete_function().graph g.as_graph_def()

运行结果

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
... node { name: "Square" op: "Square" input: "Square/ReadVariableOp" device: "/job:localhost/replica:0/task:0/device:GPU:0" attr { key: "T" value { type: DT_FLOAT } } } ...
  • 镜像策略和一些其它策略执行图内复制
  • 此复制影响了具体函数的图

tf.print()是如何工作的

问题:tf.print()操作的结果并没有被使用,它是如何执行的?

答案:此算子被添加在了返回结果的控制依赖中

tf.print是否在没有返回值的函数中仍然工作?

复制代码
1
2
3
4
5
6
7
8
9
v1 = tf.Variable(40.0) @tf.function def increment_variable(): tf.print(v1) tf.compat.v1.assign_add(v1, 1.0) increment_variable()

运行结果

复制代码
1
2
40

b. 运行图 (runtime graphs)

tf.print(): 可能影响运行运行图的优化

复制代码
1
2
3
4
5
6
7
8
9
10
@tf.function def harmonic_mean(x): x_reciprocals = tf.math.reciprocal(x) reciprocal_sum = tf.math.reduce_sum(x_reciprocals) tf.math.reduce_min(x_reciprocals) ==> tf.print(tf.math.reduce_min(x_reciprocals)) n = tf.cast(tf.size(x), tf.float32) return n / reciprocal_sum harmonic_mean(tf.constant([10.0, 20.0, 30.0]))
  • 添加tf.print()导致本来不会执行的min算子需要执行

Dump Grapper 输出: 实际执行的图

复制代码
1
2
3
$ TF_DUMP_GRAPH_PREFIX="/tmp/tf_graph_dump" bazel run my/build/target -- --vmodule=meta_optimizer=4
  • Grapper是TF内置的默认的图优化器
  • 感兴趣的通常是最后一个文件:Grapper最后的输出
  • tfdbg2的目标是使这个工作流更简单(相对于函数图和Grapper-out图)

4. 单步调试

tf.config.experimental_run_functions_eagerly()

  • 覆盖图的编译,运行所有的算子eagerly,包括backprop。
  • 然后就可以在IDE中断点调试了

此API在tf.data.Dataset.map()中不工作

  • 因为Dataset.map()总是在图执行之前编译
  • 不论是否使用了@tf.function
  • 意思
    • 在map函数中单步调试是不可能的
    • 必须使用tf.print()而不是print()输出张量的值
    • 变通:使用tfdbg2

5. 调试高级 API (tf.keras)

访问tf.keras

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
model = tf.keras.Sequential() model.add(tf.keras.layers.Dense(5, input_shape=[4], activation='relu')) model.add(tf.keras.layers.Dropout(rate=0.5)) model.add(tf.keras.layers.Dense(1, activation='sigmoid')) debug_model = tf.keras.Model( inputs=model.inputs, outputs=[model.layers[0].output, model.layers[1].output] + model.outputs) xs = tf.random_normal_initializer()([8, 4]) print(debug_model(xs, training=True))

运行结果

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
[<tf.Tensor: id=103, shape=(8, 5), dtype=float32, numpy= array([[0.03208053, 0. , 0. , 0.09101269, 0.0405516 ], [0.06668283, 0. , 0.05414589, 0. , 0.06441024], [0. , 0.02470349, 0.0345275 , 0. , 0. ], [0.02822505, 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0.03051471, 0. ], [0.01117405, 0. , 0.0744615 , 0.07232606, 0.09003952], [0. , 0.03395397, 0.04608804, 0. , 0. ], [0. , 0.02972447, 0.00674627, 0. , 0. ]], dtype=float32)>, <tf.Tensor: id=116, shape=(8, 5), dtype=float32, numpy= array([[0.06416105, 0. , 0. , 0. , 0.08110321], [0.13336566, 0. , 0. , 0. , 0.12882048], [0. , 0.04940698, 0.069055 , 0. , 0. ], [0.0564501 , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0.06102942, 0. ], [0.0223481 , 0. , 0.14892301, 0.14465213, 0.18007904], [0. , 0. , 0.09217609, 0. , 0. ], [0. , 0.05944894, 0. , 0. , 0. ]], dtype=float32)>, <tf.Tensor: id=121, shape=(8, 1), dtype=float32, numpy= array([[0.51327056], [0.52288353], [0.49928164], [0.5032595 ], [0.5143335 ], [0.54077065], [0.49030966], [0.50787127]], dtype=float32)>]
  • 要访问模型的内部层,可以构建一个新模型输出那些层
  • 如果向看层内部的梯度呢?
    • tfdbg可以帮你

使用TensorBoard回调调试Keras模型

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
from tensorflow.keras import backend as K model = tf.keras.Sequential() model.add(tf.keras.layers.Dense(5, input_shape=[4], activation='relu')) model.add(tf.keras.layers.Dropout(rate=0.5)) model.add(tf.keras.layers.Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam') xs = tf.random_normal_initializer()([8, 4]) ys = tf.zeros([8]) model.fit(xs, ys, epochs=2, callbacks=[tf.keras.callbacks.TensorBoard("tb_logdir")])
  • tf.keras.callbacks.TensorBoard回调将训练图的日志输出到logdir,包括损失、权重等信息
  • 边上被标记了张量的形状,标记到了什么程度:只有模型构建时已知的形状

6. 数值问题 (NaN / Infinity)

常见的导致数值问题的情况

  • 缺乏对值的裁剪
    • 除0,对0取对数
  • 算子的问题
  • 梯度爆炸
  • 训练样本很差

使用tfdbg2调试数值问题

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
tf.debugging.enable_check_numerics() @tf.function def bad_func(n): total = tf.constant(0.0) x = tf.constant(10.0) i = tf.constant(0, dtype=tf.int32) while tf.math.less(i, n): total += tf.math.log(x) x -= 1.0 i += 1 return total # 尝试小于10的值,观察错误 n = tf.constant(12, dtype=tf.int32) print(bad_func(n))

输出结果

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
InvalidArgumentError: !!! Detected Infinity or NaN in output 0 of graph op "Log" (# of outputs: 1) !!! dtype: <dtype: 'float32'> shape: () Input tensor: Tensor("Placeholder:0", shape=(), dtype=float32) Graph name: "while_body_13" Stack trace of op's creation ("->": inferred user code): + ... (Omitted 21 frames) + ...3.6/site-packages/IPython/core/interactiveshell.py (L2848) run_cell -> | raw_cell, store_history, silent, shell_futures) + ...3.6/site-packages/IPython/core/interactiveshell.py (L2874) _run_cell -> | return runner(coro) + ...hon3.6/site-packages/IPython/core/async_helpers.py (L68) _pseudo_sync_runner -> | coro.send(None) + ...3.6/site-packages/IPython/core/interactiveshell.py (L3051) run_cell_async -> | interactivity=interactivity, compiler=compiler, result=result) + ...3.6/site-packages/IPython/core/interactiveshell.py (L3242) run_ast_nodes -> | if (await self.run_code(code, result, async_=asy)): + ...3.6/site-packages/IPython/core/interactiveshell.py (L3319) run_code -> | exec(code_obj, self.user_global_ns, self.user_ns) + <ipython-input-3-acc5c4cbe210> (L16) <module> -> | print(bad_func(n)) + ...kages/tensorflow_core/python/eager/def_function.py (L568) __call__ | result = self._call(*args, **kwds) + ...kages/tensorflow_core/python/eager/def_function.py (L615) _call | self._initialize(args, kwds, add_initializers_to=initializers) + ...kages/tensorflow_core/python/eager/def_function.py (L497) _initialize | *args, **kwds)) + ...-packages/tensorflow_core/python/eager/function.py (L2389) _get_concrete_function_internal_garbage_collected | graph_function, _, _ = self._maybe_define_function(args, kwargs) + ...-packages/tensorflow_core/python/eager/function.py (L2703) _maybe_define_function | graph_function = self._create_graph_function(args, kwargs) + ...-packages/tensorflow_core/python/eager/function.py (L2593) _create_graph_function | capture_by_value=self._capture_by_value), + ...ges/tensorflow_core/python/framework/func_graph.py (L978) func_graph_from_py_func | func_outputs = python_func(*func_args, **func_kwargs) + ...kages/tensorflow_core/python/eager/def_function.py (L439) wrapped_fn | return weak_wrapped_fn().__wrapped__(*args, **kwds) + ...ges/tensorflow_core/python/framework/func_graph.py (L964) wrapper | user_requested=True, + <ipython-input-3-acc5c4cbe210> (L8) bad_func -> | while tf.math.less(i, n): + ...ow_core/python/autograph/operators/control_flow.py (L746) while_stmt | basic_symbol_names, composite_symbol_names, opts) + ...ow_core/python/autograph/operators/control_flow.py (L794) _tf_while_stmt | aug_init_vars, **opts) + ...ges/tensorflow_core/python/ops/control_flow_ops.py (L2675) while_loop | back_prop=back_prop) + ...te-packages/tensorflow_core/python/ops/while_v2.py (L194) while_loop | add_control_dependencies=add_control_dependencies) + ...ges/tensorflow_core/python/framework/func_graph.py (L978) func_graph_from_py_func | func_outputs = python_func(*func_args, **func_kwargs) + ...te-packages/tensorflow_core/python/ops/while_v2.py (L172) wrapped_body | outputs = body(*_pack_sequence_as(orig_loop_vars, args)) + ...ow_core/python/autograph/operators/control_flow.py (L781) aug_body | loop_vars = body(*aug_loop_vars[loop_vars_slice]) + <ipython-input-3-acc5c4cbe210> (L9) bad_func -> | total += tf.math.log(x) + ...ackages/tensorflow_core/python/ops/gen_math_ops.py (L5248) log | "Log", x=x, name=name) + ...tensorflow_core/python/framework/op_def_library.py (L742) _apply_op_helper | attrs=attr_protos, op_def=op_def) + ...ges/tensorflow_core/python/framework/func_graph.py (L595) _create_op_internal | compute_device) + ...e-packages/tensorflow_core/python/framework/ops.py (L3322) _create_op_internal | op_def=op_def) + ...e-packages/tensorflow_core/python/framework/ops.py (L1756) __init__ | self._traceback = tf_stack.extract_stack() : Tensor had Inf values [[{{node while/body/_1/Log/CheckNumerics}}]] [Op:__inference_bad_func_58] Function call stack: bad_func
  • enable_check_numerics()是TF1中add_check_numerics_ops()的继承者
  • 检查eagerly执行的算子和图内的算子
    • 工作与向前和向后传播
    • 工作于API层
    • 工作于TF1,
    • 工作在CPU, GPU和TPU
  • 相对负载
    • CPU上是1.29x的时长,GPU上是1.76x的时长,负载不高
    • 注:1.0x为无负载
    • 基于模型:tensorflow_models.official.transformers.v2 task type=training; batch size=64
    • TPU benchmarks之后会添加的,与TensorTracer协作

7. Tensorflow Debugger (tfdbg)

TensorFlow Debugger (tfdbg) V1

  • tfdbg v2的前身,启动与2017年早些时候
  • 提供tf.Session()运行时的可视化界面
    • 插入tf.Session()包中
    • Keras, Estimator, slim也可用的方便的API
  • 支持分布式训练
  • 用户界面:交互式、可点击的CLI
    • 中间张量值和他们的总结统计信息
      • 条件断点,比如has_inf_or_nan
    • 运行图结构(在Grapper和Partition之后)
    • 算子属性,包括原始堆栈
    • 源码查看

为什么需要tfdbg v2?

  • TF新执行范式
    • 没有tf.Session()
    • Eager执行+tf函数
  • print()和tf.print()能否满足可调试性?
    • 一些情况下有帮助,但不是完整的答案
    • 通用性很重要:硬件种类
    • 低性能负载很重要
      • 调试的分级侵入性
    • 前端UX很重要
用户的TF程序
tf.debugging.experimental.enable_dump_debug_info(logdir)
Debugger V2 仪表盘(正在施工????)

重要文档:

tf.debugging.experimental.enable_dump_debug_info

TensorFlow Debugger (TFDBG)

Debugger Dashboard 使用说明

最后

以上就是疯狂花生最近收集整理的关于Tensorflow 2 调试方法1. 调试 Tensor 值2. 调试设备位置3. 调试图结构4. 单步调试5. 调试高级 API (tf.keras)6. 数值问题 (NaN / Infinity)7. Tensorflow Debugger (tfdbg)的全部内容,更多相关Tensorflow内容请搜索靠谱客的其他文章。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(61)

评论列表共有 0 条评论

立即
投稿
返回
顶部