最近在学习利用TensorFlow2.0训练模型,然后就遇到了下面这个错误:
ResourceExhaustedError Traceback (most recent call last)
<ipython-input-15-0d9dc5695c3a> in <module>
----> 1 history=model.fit(train_x,train_y,epochs=5,batch_size=64,validation_data=(test_x,test_y))
D:Anacondalibsite-packagestensorflowpythonkerasenginetraining.py in _method_wrapper(self, *args, **kwargs)
64 def _method_wrapper(self, *args, **kwargs):
65 if not self._in_multi_worker_mode(): # pylint: disable=protected-access
---> 66 return method(self, *args, **kwargs)
67
68 # Running inside `run_distribute_coordinator` already.
D:Anacondalibsite-packagestensorflowpythonkerasenginetraining.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
846 batch_size=batch_size):
847 callbacks.on_train_batch_begin(step)
--> 848 tmp_logs = train_function(iterator)
849 # Catch OutOfRangeError for Datasets of unknown size.
850 # This blocks until the batch has finished executing.
D:Anacondalibsite-packagestensorflowpythoneagerdef_function.py in __call__(self, *args, **kwds)
578 xla_context.Exit()
579 else:
--> 580 result = self._call(*args, **kwds)
581
582 if tracing_count == self._get_tracing_count():
D:Anacondalibsite-packagestensorflowpythoneagerdef_function.py in _call(self, *args, **kwds)
609 # In this case we have created variables on the first call, so we run the
610 # defunned version which is guaranteed to never create variables.
--> 611 return self._stateless_fn(*args, **kwds) # pylint: disable=not-callable
612 elif self._stateful_fn is not None:
613 # Release the lock early so that multiple threads can perform the call
D:Anacondalibsite-packagestensorflowpythoneagerfunction.py in __call__(self, *args, **kwargs)
2418 with self._lock:
2419 graph_function, args, kwargs = self._maybe_define_function(args, kwargs)
-> 2420 return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
2421
2422 @property
D:Anacondalibsite-packagestensorflowpythoneagerfunction.py in _filtered_call(self, args, kwargs)
1663 if isinstance(t, (ops.Tensor,
1664 resource_variable_ops.BaseResourceVariable))),
-> 1665 self.captured_inputs)
1666
1667 def _call_flat(self, args, captured_inputs, cancellation_manager=None):
D:Anacondalibsite-packagestensorflowpythoneagerfunction.py in _call_flat(self, args, captured_inputs, cancellation_manager)
1744 # No tape is watching; skip to running the function.
1745 return self._build_call_outputs(self._inference_function.call(
-> 1746 ctx, args, cancellation_manager=cancellation_manager))
1747 forward_backward = self._select_forward_and_backward_functions(
1748 args,
D:Anacondalibsite-packagestensorflowpythoneagerfunction.py in call(self, ctx, args, cancellation_manager)
596 inputs=args,
597 attrs=attrs,
--> 598 ctx=ctx)
599 else:
600 outputs = execute.execute_with_cancellation(
D:Anacondalibsite-packagestensorflowpythoneagerexecute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
58 ctx.ensure_initialized()
59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60 inputs, attrs, num_outputs)
61 except core._NotOkStatusException as e:
62 if name is not None:
ResourceExhaustedError: OOM when allocating tensor with shape[64,64,252,252] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node sequential/conv2d_1/Conv2D (defined at <ipython-input-15-0d9dc5695c3a>:1) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[Op:__inference_train_function_3331]
Function call stack:
train_function
这个报错提示资源用尽,所以我们可以首先通过一个可以正常训练的模型来测试出我们机器的最大上限是什么!例如:
history=model.fit(train_x,train_y,epochs=5,batch_size=64,validation_data=(test_x,test_y))
当我这里的batch_size=64时,机器就会提示“ResourceExhaustedError”这个错误,当batch_size=16时,机器就会正常训练这个模型,通过不断的调整这个数字,最终我得出我的电脑可以接受的最大batch_size=21,超过这个批量我的电脑就会报错。通过在不断调整中的发现,资源溢出是由于显存不够造成的。你可以通过这种方法去查看你的资源管理器,发现具体的溢出原因是什么!
这里做个记录以及分享!
最后
以上就是昏睡飞鸟最近收集整理的关于TensorFlow2.0学习过程中遇到“ResourceExhaustedError”错误的一种解决思路的全部内容,更多相关TensorFlow2.0学习过程中遇到“ResourceExhaustedError”错误内容请搜索靠谱客的其他文章。
本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
发表评论 取消回复