我是靠谱客的博主 寂寞戒指,这篇文章主要介绍amd vega56 ubuntu 下 tensorflow GPU rocm 运行情况记录及跑分,现在分享给大家,希望可以做个参考。

我的机器比较老了 i5 4570 hdd 16G

ubuntu 18.04.2 kernel  5.0.0-31-generic

如何安装?

首先必须是linux

然后参考 https://rocm.github.io/ROCmInstall.html#ubuntu-support---installing-from-a-debian-repository

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
sudo apt update sudo apt dist-upgrade sudo apt install libnuma-dev sudo reboot wget -qO - http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo apt-key add - echo 'deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main' | sudo tee /etc/apt/sources.list.d/rocm.list sudo apt update sudo apt install rocm-dkms sudo usermod -a -G video $LOGNAME

完成基本安装

然后

复制代码
1
2
3
4
sudo apt install rocm-libs miopen-hip cxlactivitylogger sudo apt-get update && sudo apt-get install -y --allow-unauthenticated rocm-dkms rocm-dev rocm-libs rccl rocm-device-libs hsa-ext-rocr-dev hsakmt-roct-dev hsa-rocr-dev rocm-opencl rocm-opencl-dev rocm-utils rocm-profiler cxlactivitylogger miopen-hip miopengemm

然后

pip3 install tensorflow-rocm -i https://pypi.tuna.tsinghua.edu.cn/simple

这个地方默认安装的是1.14,如果要安装tf1.13 ( tensorflow-rocm==1.13.1) 会报错。。。 也不知道咋办。。。

 

一下就以tf1.14 models 1.5 来粘下运行情况

/home/zc/models-r1.5/official/mnist/ 的

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
INFO:tensorflow:Done calling model_fn. I1005 14:23:32.420413 140280443631424 estimator.py:1147] Done calling model_fn. INFO:tensorflow:Create CheckpointSaverHook. I1005 14:23:32.421550 140280443631424 basic_session_run_hooks.py:541] Create CheckpointSaverHook. INFO:tensorflow:Graph was finalized. I1005 14:23:32.561293 140280443631424 monitored_session.py:240] Graph was finalized. 2019-10-05 14:23:32.561613: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2019-10-05 14:23:32.588956: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3199815000 Hz 2019-10-05 14:23:32.589530: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55cf4c094920 executing computations on platform Host. Devices: 2019-10-05 14:23:32.589565: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined> 2019-10-05 14:23:32.589793: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libhip_hcc.so 2019-10-05 14:23:32.626274: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1651] Found device 0 with properties: name: Vega 10 XT [Radeon RX Vega 64] AMDGPU ISA: gfx900 memoryClockRate (GHz) 1.59 pciBusID 0000:03:00.0 2019-10-05 14:23:32.663438: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library librocblas.so 2019-10-05 14:23:32.664629: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libMIOpen.so 2019-10-05 14:23:32.666046: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library librocfft.so 2019-10-05 14:23:32.666334: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library librocrand.so 2019-10-05 14:23:32.666455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0 2019-10-05 14:23:32.666551: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-05 14:23:32.666571: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0 2019-10-05 14:23:32.666579: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N 2019-10-05 14:23:32.666744: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7524 MB memory) -> physical GPU (device: 0, name: Vega 10 XT [Radeon RX Vega 64], pci bus id: 0000:03:00.0) 2019-10-05 14:23:32.668748: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55cf486ba940 executing computations on platform ROCM. Devices: 2019-10-05 14:23:32.668783: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Vega 10 XT [Radeon RX Vega 64], AMDGPU ISA version: gfx900 WARNING:tensorflow:From /home/zc/miniconda3/envs/rocmtf/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. W1005 14:23:32.670347 140280443631424 deprecation.py:323] From /home/zc/miniconda3/envs/rocmtf/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. INFO:tensorflow:Restoring parameters from /tmp/mnist_model/model.ckpt-24000 I1005 14:23:32.678996 140280443631424 saver.py:1280] Restoring parameters from /tmp/mnist_model/model.ckpt-24000 WARNING:tensorflow:From /home/zc/miniconda3/envs/rocmtf/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1066: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file utilities to get mtimes. W1005 14:23:40.458618 140280443631424 deprecation.py:323] From /home/zc/miniconda3/envs/rocmtf/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1066: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file utilities to get mtimes. 2019-10-05 14:23:40.496950: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile. INFO:tensorflow:Running local_init_op. I1005 14:23:40.501688 140280443631424 session_manager.py:500] Running local_init_op. INFO:tensorflow:Done running local_init_op. I1005 14:23:40.514181 140280443631424 session_manager.py:502] Done running local_init_op. INFO:tensorflow:Saving checkpoints for 24000 into /tmp/mnist_model/model.ckpt. I1005 14:23:40.762727 140280443631424 basic_session_run_hooks.py:606] Saving checkpoints for 24000 into /tmp/mnist_model/model.ckpt. 2019-10-05 14:23:40.947216: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library librocblas.so 2019-10-05 14:23:48.989702: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libMIOpen.so 2019-10-05 14:23:55.898632: I tensorflow/core/kernels/conv_grad_input_ops.cc:981] running auto-tune for Backward-Data 2019-10-05 14:23:55.948926: I tensorflow/core/kernels/conv_grad_filter_ops.cc:875] running auto-tune for Backward-Filter 2019-10-05 14:23:56.168817: I tensorflow/core/kernels/conv_grad_filter_ops.cc:875] running auto-tune for Backward-Filter INFO:tensorflow:train_accuracy = 1.0 I1005 14:23:56.406840 140280443631424 basic_session_run_hooks.py:262] train_accuracy = 1.0 INFO:tensorflow:loss = 0.0004839369, step = 24000 I1005 14:23:56.407395 140280443631424 basic_session_run_hooks.py:262] loss = 0.0004839369, step = 24000 INFO:tensorflow:global_step/sec: 73.6445 I1005 14:23:57.764089 140280443631424 basic_session_run_hooks.py:692] global_step/sec: 73.6445 INFO:tensorflow:train_accuracy = 1.0 (1.358 sec) I1005 14:23:57.764740 140280443631424 basic_session_run_hooks.py:260] train_accuracy = 1.0 (1.358 sec) INFO:tensorflow:loss = 0.0016695356, step = 24100 (1.358 sec) I1005 14:23:57.764930 140280443631424 basic_session_run_hooks.py:260] loss = 0.0016695356, step = 24100 (1.358 sec) INFO:tensorflow:global_step/sec: 161.989 I1005 14:23:58.381400 140280443631424 basic_session_run_hooks.py:692] global_step/sec: 161.989 INFO:tensorflow:train_accuracy = 1.0 (0.617 sec) I1005 14:23:58.382167 140280443631424 basic_session_run_hooks.py:260] train_accuracy = 1.0 (0.617 sec) INFO:tensorflow:loss = 0.0026494868, step = 24200 (0.617 sec) I1005 14:23:58.382354 140280443631424 basic_session_run_hooks.py:260] loss = 0.0026494868, step = 24200 (0.617 sec) INFO:tensorflow:global_step/sec: 166.851 I1005 14:23:58.980767 140280443631424 basic_session_run_hooks.py:692] global_step/sec: 166.851 INFO:tensorflow:train_accuracy = 1.0 (0.599 sec) I1005 14:23:58.981556 140280443631424 basic_session_run_hooks.py:260] train_accuracy = 1.0 (0.599 sec) INFO:tensorflow:loss = 0.00032299932, step = 24300 (0.599 sec) I1005 14:23:58.981797 140280443631424 basic_session_run_hooks.py:260] loss = 0.00032299932, step = 24300 (0.599 sec) INFO:tensorflow:global_step/sec: 168.752 I1005 14:23:59.573339 140280443631424 basic_session_run_hooks.py:692] global_step/sec: 168.752 INFO:tensorflow:train_accuracy = 1.0 (0.592 sec) I1005 14:23:59.574037 140280443631424 basic_session_run_hooks.py:260] train_accuracy = 1.0 (0.592 sec) INFO:tensorflow:loss = 0.0003701407, step = 24400 (0.592 sec) I1005 14:23:59.574180 140280443631424 basic_session_run_hooks.py:260] loss = 0.0003701407, step = 24400 (0.592 sec) INFO:tensorflow:global_step/sec: 167.43 I1005 14:24:00.170599 140280443631424 basic_session_run_hooks.py:692] global_step/sec: 167.43 INFO:tensorflow:train_accuracy = 1.0 (0.597 sec) I1005 14:24:00.171491 140280443631424 basic_session_run_hooks.py:260] train_accuracy = 1.0 (0.597 sec) INFO:tensorflow:loss = 0.0006009388, step = 24500 (0.597 sec) I1005 14:24:00.171680 140280443631424 basic_session_run_hooks.py:260] loss = 0.0006009388, step = 24500 (0.597 sec) INFO:tensorflow:global_step/sec: 161.194 I1005 14:24:00.790968 140280443631424 basic_session_run_hooks.py:692] global_step/sec: 161.194 INFO:tensorflow:train_accuracy = 0.99857146 (0.620 sec) I1005 14:24:00.791700 140280443631424 basic_session_run_hooks.py:260] train_accuracy = 0.99857146 (0.620 sec) INFO:tensorflow:loss = 0.010238873, step = 24600 (0.620 sec) I1005 14:24:00.792010 140280443631424 basic_session_run_hooks.py:260] loss = 0.010238873, step = 24600 (0.620 sec) INFO:tensorflow:global_step/sec: 167.456 I1005 14:24:01.388126 140280443631424 basic_session_run_hooks.py:692] global_step/sec: 167.456 INFO:tensorflow:train_accuracy = 0.99875 (0.597 sec) I1005 14:24:01.388878 140280443631424 basic_session_run_hooks.py:260] train_accuracy = 0.99875 (0.597 sec) INFO:tensorflow:loss = 1.1944725e-05, step = 24700 (0.597 sec) I1005 14:24:01.389158 140280443631424 basic_session_run_hooks.py:260] loss = 1.1944725e-05, step = 24700 (0.597 sec) INFO:tensorflow:global_step/sec: 167.955 I1005 14:24:01.983534 140280443631424 basic_session_run_hooks.py:692] global_step/sec: 167.955 INFO:tensorflow:train_accuracy = 0.9988889 (0.596 sec) I1005 14:24:01.984512 140280443631424 basic_session_run_hooks.py:260] train_accuracy = 0.9988889 (0.596 sec) INFO:tensorflow:loss = 3.371142e-05, step = 24800 (0.596 sec) I1005 14:24:01.984758 140280443631424 basic_session_run_hooks.py:260] loss = 3.371142e-05, step = 24800 (0.596 sec) INFO:tensorflow:global_step/sec: 169.656 I1005 14:24:02.572987 140280443631424 basic_session_run_hooks.py:692] global_step/sec: 169.656 INFO:tensorflow:train_accuracy = 0.999 (0.589 sec) I1005 14:24:02.573793 140280443631424 basic_session_run_hooks.py:260] train_accuracy = 0.999 (0.589 sec) INFO:tensorflow:loss = 6.105689e-05, step = 24900 (0.589 sec) I1005 14:24:02.573983 140280443631424 basic_session_run_hooks.py:260] loss = 6.105689e-05, step = 24900 (0.589 sec) INFO:tensorflow:global_step/sec: 167.571 I1005 14:24:03.169733 140280443631424 basic_session_run_hooks.py:692] global_step/sec: 167.571 INFO:tensorflow:train_accuracy = 0.9990909 (0.597 sec) I1005 14:24:03.170553 140280443631424 basic_session_run_hooks.py:260] train_accuracy = 0.9990909 (0.597 sec) INFO:tensorflow:loss = 0.0004118733, step = 25000 (0.597 sec) I1005 14:24:03.170741 140280443631424 basic_session_run_hooks.py:260] loss = 0.0004118733, step = 25000 (0.597 sec) INFO:tensorflow:global_step/sec: 169.609 I1005 14:24:03.759313 140280443631424 basic_session_run_hooks.py:692] global_step/sec: 169.609 INFO:tensorflow:train_accuracy = 0.99916667 (0.590 sec) I1005 14:24:03.760116 140280443631424 basic_session_run_hooks.py:260] train_accuracy = 0.99916667 (0.590 sec) INFO:tensorflow:loss = 0.00070394913, step = 25100 (0.590 sec) I1005 14:24:03.760275 140280443631424 basic_session_run_hooks.py:260] loss = 0.00070394913, step = 25100 (0.590 sec) INFO:tensorflow:global_step/sec: 159.183 I1005 14:24:04.387524 140280443631424 basic_session_run_hooks.py:692] global_step/sec: 159.183 INFO:tensorflow:train_accuracy = 0.99923074 (0.628 sec) I1005 14:24:04.388160 140280443631424 basic_session_run_hooks.py:260] train_accuracy = 0.99923074 (0.628 sec) INFO:tensorflow:loss = 9.452502e-05, step = 25200 (0.628 sec)

/home/zc/models-r1.5/tutorials/image/cifar10/ 的

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
2019-10-05 14:51:03.367600: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1651] Found device 0 with properties: name: Vega 10 XT [Radeon RX Vega 64] AMDGPU ISA: gfx900 memoryClockRate (GHz) 1.59 pciBusID 0000:03:00.0 2019-10-05 14:51:03.400899: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library librocblas.so 2019-10-05 14:51:03.401687: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libMIOpen.so 2019-10-05 14:51:03.402580: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library librocfft.so 2019-10-05 14:51:03.402764: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library librocrand.so 2019-10-05 14:51:03.402860: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0 2019-10-05 14:51:03.402959: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-05 14:51:03.402970: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0 2019-10-05 14:51:03.402985: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N 2019-10-05 14:51:03.403105: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7524 MB memory) -> physical GPU (device: 0, name: Vega 10 XT [Radeon RX Vega 64], pci bus id: 0000:03:00.0) 2019-10-05 14:51:03.405734: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55749a8daf80 executing computations on platform ROCM. Devices: 2019-10-05 14:51:03.405767: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Vega 10 XT [Radeon RX Vega 64], AMDGPU ISA version: gfx900 2019-10-05 14:51:03.498389: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile. INFO:tensorflow:Running local_init_op. I1005 14:51:11.534648 139922356148032 session_manager.py:500] Running local_init_op. INFO:tensorflow:Done running local_init_op. I1005 14:51:11.544094 139922356148032 session_manager.py:502] Done running local_init_op. WARNING:tensorflow:From /home/zc/miniconda3/envs/rocmtf/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py:875: start_queue_runners (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version. Instructions for updating: To construct input pipelines, use the `tf.data` module. W1005 14:51:11.583780 139922356148032 deprecation.py:323] From /home/zc/miniconda3/envs/rocmtf/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py:875: start_queue_runners (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version. Instructions for updating: To construct input pipelines, use the `tf.data` module. INFO:tensorflow:Saving checkpoints for 0 into /tmp/cifar10_train/model.ckpt. I1005 14:51:12.117074 139922356148032 basic_session_run_hooks.py:606] Saving checkpoints for 0 into /tmp/cifar10_train/model.ckpt. 2019-10-05 14:51:12.419670: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library librocblas.so 2019-10-05 14:51:19.892185: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libMIOpen.so 2019-10-05 14:51:26.778551: I tensorflow/core/kernels/conv_grad_input_ops.cc:981] running auto-tune for Backward-Data 2019-10-05 14:51:26.828987: I tensorflow/core/kernels/conv_grad_filter_ops.cc:875] running auto-tune for Backward-Filter 2019-10-05 14:51:27.074274: I tensorflow/core/kernels/conv_grad_filter_ops.cc:875] running auto-tune for Backward-Filter 2019-10-05 14:51:27.292801: step 0, loss = 4.68 (53.1 examples/sec; 2.409 sec/batch) 2019-10-05 14:51:27.568947: step 10, loss = 4.65 (4635.0 examples/sec; 0.028 sec/batch) 2019-10-05 14:51:27.732518: step 20, loss = 4.48 (7825.4 examples/sec; 0.016 sec/batch) 2019-10-05 14:51:27.893633: step 30, loss = 4.32 (7944.4 examples/sec; 0.016 sec/batch) 2019-10-05 14:51:28.052451: step 40, loss = 4.34 (8059.8 examples/sec; 0.016 sec/batch) 2019-10-05 14:51:28.205325: step 50, loss = 4.28 (8372.7 examples/sec; 0.015 sec/batch) 2019-10-05 14:51:28.356338: step 60, loss = 4.29 (8476.2 examples/sec; 0.015 sec/batch) 2019-10-05 14:51:28.505872: step 70, loss = 4.22 (8559.9 examples/sec; 0.015 sec/batch) 2019-10-05 14:51:28.664742: step 80, loss = 4.31 (8056.9 examples/sec; 0.016 sec/batch) 2019-10-05 14:51:28.829310: step 90, loss = 4.25 (7778.1 examples/sec; 0.016 sec/batch) INFO:tensorflow:global_step/sec: 54.4963 I1005 14:51:29.126652 139922356148032 basic_session_run_hooks.py:692] global_step/sec: 54.4963 2019-10-05 14:51:29.127789: step 100, loss = 4.17 (4288.3 examples/sec; 0.030 sec/batch) 2019-10-05 14:51:29.299471: step 110, loss = 4.11 (7455.8 examples/sec; 0.017 sec/batch) 2019-10-05 14:51:29.455985: step 120, loss = 3.94 (8178.1 examples/sec; 0.016 sec/batch) 2019-10-05 14:51:29.612898: step 130, loss = 4.08 (8157.4 examples/sec; 0.016 sec/batch) 2019-10-05 14:51:29.772487: step 140, loss = 3.97 (8020.5 examples/sec; 0.016 sec/batch) 2019-10-05 14:51:29.950281: step 150, loss = 4.06 (7199.4 examples/sec; 0.018 sec/batch) 2019-10-05 14:51:30.111554: step 160, loss = 4.20 (7936.9 examples/sec; 0.016 sec/batch) 2019-10-05 14:51:30.265593: step 170, loss = 3.87 (8309.5 examples/sec; 0.015 sec/batch) 2019-10-05 14:51:30.416730: step 180, loss = 3.89 (8469.1 examples/sec; 0.015 sec/batch) 2019-10-05 14:51:30.573256: step 190, loss = 3.91 (8177.6 examples/sec; 0.016 sec/batch) INFO:tensorflow:global_step/sec: 57.5419 I1005 14:51:30.864464 139922356148032 basic_session_run_hooks.py:692] global_step/sec: 57.5419 2019-10-05 14:51:30.865772: step 200, loss = 3.65 (4375.8 examples/sec; 0.029 sec/batch) 2019-10-05 14:51:31.046528: step 210, loss = 3.82 (7081.5 examples/sec; 0.018 sec/batch) 2019-10-05 14:51:31.203528: step 220, loss = 3.78 (8152.8 examples/sec; 0.016 sec/batch) 2019-10-05 14:51:31.354960: step 230, loss = 3.69 (8452.6 examples/sec; 0.015 sec/batch) 2019-10-05 14:51:31.514334: step 240, loss = 3.77 (8031.4 examples/sec; 0.016 sec/batch) 2019-10-05 14:51:31.685647: step 250, loss = 3.71 (7471.7 examples/sec; 0.017 sec/batch) 2019-10-05 14:51:31.846451: step 260, loss = 3.74 (7960.0 examples/sec; 0.016 sec/batch) 2019-10-05 14:51:32.009800: step 270, loss = 3.59 (7836.0 examples/sec; 0.016 sec/batch) 2019-10-05 14:51:32.165830: step 280, loss = 3.70 (8203.6 examples/sec; 0.016 sec/batch) 2019-10-05 14:51:32.392678: step 290, loss = 3.42 (5643.1 examples/sec; 0.023 sec/batch) INFO:tensorflow:global_step/sec: 53.4814 I1005 14:51:32.734279 139922356148032 basic_session_run_hooks.py:692] global_step/sec: 53.4814 2019-10-05 14:51:32.735830: step 300, loss = 3.59 (3729.9 examples/sec; 0.034 sec/batch) 2019-10-05 14:51:32.917374: step 310, loss = 3.49 (7050.6 examples/sec; 0.018 sec/batch) 2019-10-05 14:51:33.090716: step 320, loss = 3.49 (7384.3 examples/sec; 0.017 sec/batch) 2019-10-05 14:51:33.273003: step 330, loss = 3.58 (7022.1 examples/sec; 0.018 sec/batch) 2019-10-05 14:51:33.433509: step 340, loss = 3.35 (7974.5 examples/sec; 0.016 sec/batch) 2019-10-05 14:51:33.594271: step 350, loss = 3.33 (7962.1 examples/sec; 0.016 sec/batch) 2019-10-05 14:51:33.759240: step 360, loss = 3.37 (7759.2 examples/sec; 0.016 sec/batch) 2019-10-05 14:51:33.909936: step 370, loss = 3.39 (8493.7 examples/sec; 0.015 sec/batch) 2019-10-05 14:51:34.075028: step 380, loss = 3.55 (7753.4 examples/sec; 0.017 sec/batch) 2019-10-05 14:51:34.237601: step 390, loss = 3.50 (7873.3 examples/sec; 0.016 sec/batch) INFO:tensorflow:global_step/sec: 55.9487 I1005 14:51:34.521626 139922356148032 basic_session_run_hooks.py:692] global_step/sec: 55.9487 2019-10-05 14:51:34.522810: step 400, loss = 3.32 (4487.9 examples/sec; 0.029 sec/batch) 2019-10-05 14:51:34.686445: step 410, loss = 3.35 (7822.4 examples/sec; 0.016 sec/batch) 2019-10-05 14:51:34.855793: step 420, loss = 3.59 (7558.3 examples/sec; 0.017 sec/batch) 2019-10-05 14:51:35.015741: step 430, loss = 3.30 (8002.6 examples/sec; 0.016 sec/batch) 2019-10-05 14:51:35.180842: step 440, loss = 3.09 (7762.0 examples/sec; 0.016 sec/batch) 2019-10-05 14:51:35.344060: step 450, loss = 3.19 (7832.9 examples/sec; 0.016 sec/batch) 2019-10-05 14:51:35.490957: step 460, loss = 3.17 (8714.1 examples/sec; 0.015 sec/batch) 2019-10-05 14:51:35.637316: step 470, loss = 3.26 (8745.2 examples/sec; 0.015 sec/batch) 2019-10-05 14:51:35.831752: step 480, loss = 3.35 (6583.3 examples/sec; 0.019 sec/batch) 2019-10-05 14:51:35.995347: step 490, loss = 3.11 (7824.0 examples/sec; 0.016 sec/batch) INFO:tensorflow:global_step/sec: 56.7619

watch -n 1 /opt/rocm/bin/rocm-smi 的结果

cifar10 是跑不满GPU的 大约只有TDP 80W左右 ,mnist 有120W-150W (华擎公版vega56 非oc bios 最高150,OC那边165W)

 

换上简单的docker  这次就是tf1.13 速度有所提高

复制代码
1
2
docker run --rm -it -v $HOME:/data --privileged --rm --device=/dev/kfd --device=/dev/dri --group-add video rocm/tensorflow:rocm2.6-tf1.13-python3

/home/zc/models-r1.5/official/mnist/

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
2019-10-05 07:08:32.274088: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2019-10-05 07:08:32.369834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1531] Found device 0 with properties: name: Device 687f AMDGPU ISA: gfx900 memoryClockRate (GHz) 1.59 pciBusID 0000:03:00.0 Total memory: 7.98GiB Free memory: 7.73GiB 2019-10-05 07:08:32.369874: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1642] Adding visible gpu devices: 0 2019-10-05 07:08:32.369900: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-05 07:08:32.369917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1059] 0 2019-10-05 07:08:32.369923: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1072] 0: N 2019-10-05 07:08:32.369980: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7524 MB memory) -> physical GPU (device: 0, name: Device 687f, pci bus id: 0000:03:00.0) INFO:tensorflow:Running local_init_op. INFO:tensorflow:Done running local_init_op. INFO:tensorflow:Saving checkpoints for 0 into /tmp/mnist_model/model.ckpt. 2019-10-05 07:09:19.753740: I tensorflow/core/kernels/conv_grad_input_ops.cc:1027] running auto-tune for Backward-Data 2019-10-05 07:09:22.482913: I tensorflow/core/kernels/conv_grad_filter_ops.cc:979] running auto-tune for Backward-Filter 2019-10-05 07:09:27.833345: I tensorflow/core/kernels/conv_grad_filter_ops.cc:979] running auto-tune for Backward-Filter INFO:tensorflow:train_accuracy = 0.17 INFO:tensorflow:loss = 2.2934916, step = 0 INFO:tensorflow:global_step/sec: 63.3507 INFO:tensorflow:train_accuracy = 0.53 (1.579 sec) INFO:tensorflow:loss = 0.41081315, step = 100 (1.578 sec) INFO:tensorflow:global_step/sec: 172.868 INFO:tensorflow:train_accuracy = 0.6566667 (0.578 sec) INFO:tensorflow:loss = 0.2912456, step = 200 (0.579 sec) INFO:tensorflow:global_step/sec: 178.331 INFO:tensorflow:train_accuracy = 0.7225 (0.561 sec) INFO:tensorflow:loss = 0.24256901, step = 300 (0.561 sec) INFO:tensorflow:global_step/sec: 176.009 INFO:tensorflow:train_accuracy = 0.762 (0.568 sec) INFO:tensorflow:loss = 0.20687613, step = 400 (0.568 sec) INFO:tensorflow:global_step/sec: 176.837 INFO:tensorflow:train_accuracy = 0.79833335 (0.566 sec) INFO:tensorflow:loss = 0.087043725, step = 500 (0.566 sec) INFO:tensorflow:global_step/sec: 171.856 INFO:tensorflow:train_accuracy = 0.82285714 (0.582 sec) INFO:tensorflow:loss = 0.1064676, step = 600 (0.582 sec) INFO:tensorflow:global_step/sec: 172.889 INFO:tensorflow:train_accuracy = 0.8375 (0.579 sec) INFO:tensorflow:loss = 0.21137495, step = 700 (0.579 sec) INFO:tensorflow:global_step/sec: 178.836 INFO:tensorflow:train_accuracy = 0.8522222 (0.559 sec) INFO:tensorflow:loss = 0.18393756, step = 800 (0.559 sec) INFO:tensorflow:global_step/sec: 179.369 INFO:tensorflow:train_accuracy = 0.866 (0.558 sec) INFO:tensorflow:loss = 0.04732112, step = 900 (0.558 sec) INFO:tensorflow:global_step/sec: 176.643 INFO:tensorflow:train_accuracy = 0.87454545 (0.566 sec) INFO:tensorflow:loss = 0.07572386, step = 1000 (0.566 sec) INFO:tensorflow:global_step/sec: 178.853 INFO:tensorflow:train_accuracy = 0.87916666 (0.559 sec) INFO:tensorflow:loss = 0.14445473, step = 1100 (0.559 sec) INFO:tensorflow:global_step/sec: 170.553 INFO:tensorflow:train_accuracy = 0.88769233 (0.586 sec) INFO:tensorflow:loss = 0.046243306, step = 1200 (0.586 sec) INFO:tensorflow:global_step/sec: 173.607 INFO:tensorflow:train_accuracy = 0.89285713 (0.576 sec) INFO:tensorflow:loss = 0.077932455, step = 1300 (0.576 sec) INFO:tensorflow:global_step/sec: 180.427 INFO:tensorflow:train_accuracy = 0.8986667 (0.554 sec) INFO:tensorflow:loss = 0.04583888, step = 1400 (0.554 sec) INFO:tensorflow:global_step/sec: 180.881 INFO:tensorflow:train_accuracy = 0.903125 (0.553 sec) INFO:tensorflow:loss = 0.08700336, step = 1500 (0.553 sec) INFO:tensorflow:global_step/sec: 173.771

/home/zc/models-r1.5/tutorials/image/cifar10/ 的

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
Queue-based input pipelines have been replaced by `tf.data`. Use `tf.data.Dataset.shuffle(min_after_dequeue).batch(batch_size)`. 2019-10-05 07:11:31.581090: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2019-10-05 07:11:31.626012: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1531] Found device 0 with properties: name: Device 687f AMDGPU ISA: gfx900 memoryClockRate (GHz) 1.59 pciBusID 0000:03:00.0 Total memory: 7.98GiB Free memory: 7.73GiB 2019-10-05 07:11:31.626041: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1642] Adding visible gpu devices: 0 2019-10-05 07:11:31.626064: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-05 07:11:31.626081: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1059] 0 2019-10-05 07:11:31.626087: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1072] 0: N 2019-10-05 07:11:31.626166: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7524 MB memory) -> physical GPU (device: 0, name: Device 687f, pci bus id: 0000:03:00.0) WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/training/monitored_session.py:809: start_queue_runners (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version. Instructions for updating: To construct input pipelines, use the `tf.data` module. 2019-10-05 07:12:19.668639: I tensorflow/core/kernels/conv_grad_input_ops.cc:1027] running auto-tune for Backward-Data 2019-10-05 07:12:21.352489: I tensorflow/core/kernels/conv_grad_filter_ops.cc:979] running auto-tune for Backward-Filter 2019-10-05 07:12:26.600089: I tensorflow/core/kernels/conv_grad_filter_ops.cc:979] running auto-tune for Backward-Filter 2019-10-05 07:12:31.251288: step 0, loss = 4.67 (21.4 examples/sec; 5.977 sec/batch) 2019-10-05 07:12:31.494354: step 10, loss = 4.59 (5265.9 examples/sec; 0.024 sec/batch) 2019-10-05 07:12:31.629644: step 20, loss = 4.76 (9461.2 examples/sec; 0.014 sec/batch) 2019-10-05 07:12:31.759203: step 30, loss = 4.42 (9879.6 examples/sec; 0.013 sec/batch) 2019-10-05 07:12:31.879325: step 40, loss = 4.32 (10655.8 examples/sec; 0.012 sec/batch) 2019-10-05 07:12:32.001538: step 50, loss = 4.40 (10473.7 examples/sec; 0.012 sec/batch) 2019-10-05 07:12:32.122003: step 60, loss = 4.31 (10625.4 examples/sec; 0.012 sec/batch) 2019-10-05 07:12:32.242860: step 70, loss = 4.44 (10590.9 examples/sec; 0.012 sec/batch) 2019-10-05 07:12:32.365681: step 80, loss = 4.20 (10421.7 examples/sec; 0.012 sec/batch) 2019-10-05 07:12:32.487062: step 90, loss = 4.08 (10545.3 examples/sec; 0.012 sec/batch) 2019-10-05 07:12:32.740512: step 100, loss = 4.17 (5050.3 examples/sec; 0.025 sec/batch) 2019-10-05 07:12:32.870252: step 110, loss = 4.01 (9866.1 examples/sec; 0.013 sec/batch) 2019-10-05 07:12:32.993540: step 120, loss = 3.98 (10382.4 examples/sec; 0.012 sec/batch) 2019-10-05 07:12:33.117593: step 130, loss = 3.87 (10317.8 examples/sec; 0.012 sec/batch) 2019-10-05 07:12:33.239487: step 140, loss = 3.97 (10500.8 examples/sec; 0.012 sec/batch) 2019-10-05 07:12:33.361031: step 150, loss = 4.08 (10531.4 examples/sec; 0.012 sec/batch) 2019-10-05 07:12:33.485318: step 160, loss = 3.97 (10298.6 examples/sec; 0.012 sec/batch) 2019-10-05 07:12:33.605257: step 170, loss = 3.99 (10672.5 examples/sec; 0.012 sec/batch) 2019-10-05 07:12:33.733531: step 180, loss = 4.16 (9978.5 examples/sec; 0.013 sec/batch) 2019-10-05 07:12:33.860067: step 190, loss = 4.00 (10115.7 examples/sec; 0.013 sec/batch) 2019-10-05 07:12:34.112532: step 200, loss = 3.74 (5070.0 examples/sec; 0.025 sec/batch) 2019-10-05 07:12:34.243842: step 210, loss = 3.65 (9748.1 examples/sec; 0.013 sec/batch) 2019-10-05 07:12:34.365270: step 220, loss = 3.72 (10541.3 examples/sec; 0.012 sec/batch) 2019-10-05 07:12:34.485716: step 230, loss = 3.79 (10627.1 examples/sec; 0.012 sec/batch) 2019-10-05 07:12:34.608241: step 240, loss = 3.71 (10446.7 examples/sec; 0.012 sec/batch) 2019-10-05 07:12:34.731073: step 250, loss = 3.81 (10420.7 examples/sec; 0.012 sec/batch) 2019-10-05 07:12:34.871777: step 260, loss = 3.67 (9097.2 examples/sec; 0.014 sec/batch) 2019-10-05 07:12:34.993355: step 270, loss = 3.54 (10528.2 examples/sec; 0.012 sec/batch) 2019-10-05 07:12:35.113886: step 280, loss = 3.56 (10619.6 examples/sec; 0.012 sec/batch) 2019-10-05 07:12:35.237534: step 290, loss = 3.57 (10352.2 examples/sec; 0.012 sec/batch)

 

如果使用oc bios 那么 mnist 可以到0.55 0.54,cifar10 因为本来也跑不满,所以没啥大变化

 

这个是非刷vega64 bios ,没有调整 啥 hbcc (看有些挖矿的帖子说有办法提高近1倍算力。。。) 原生态上机的结果

 

这个其实和我那个gtx1660ti差不多的成绩,也就是继续类似gtx1070? 和当年的设计指标没区别?

看性能似乎一般,但是考虑到价格 1800左右 还有8G显存 据说 hbcc 还能整出16G显存。。。 也算是一个低价选择吧。。。

最后 涡轮是真响

 

 

最后

以上就是寂寞戒指最近收集整理的关于amd vega56 ubuntu 下 tensorflow GPU rocm 运行情况记录及跑分的全部内容,更多相关amd内容请搜索靠谱客的其他文章。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(67)

评论列表共有 0 条评论

立即
投稿
返回
顶部