关于Mask_RCNN的工程化应用cpu服务器部署日记(二) Tesnor Debugger--tfdgb

https://www.tensorflow.org/guide/debugger?hl=zh-cn [1] 这是tesnorflow的官方调试程序说明

我使用的是keras,按照指引进行代码在model所在的测试python脚本进行设置,由于我训练阶段没问题,所以训练脚本不适合使用...

import  keras.backend as K
import tensorflow as tf
from tensorflow.python import debug as tf_debug
K.set_session(tf_debug.LocalCLIDebugWrapperSession(tf.Session()))

运行,报错, 报找不到termainal

google之后,解决方案是

在pycharm的runconfiguration里面, 勾选emulate teminal in output console

这时运行,就可以进入 tfdbg的命令窗口

根据文档[1],在实践中发现比较有用的命令为

lt 当控制不太正常时,调整姿态

lt -n ROI/top* 重要说明, 名称的前缀必定是 "该tensor所属的layer名称" + "/" + "tensor name"

pt tensorname 打印其值

eval " np.max (` tensor name`) " 是用反引号` `, 引用tensor可以进行代码调试. 6的不行

调试结果果然是发现tf.gather的

Note that on CPU, if an out of bound index is found, an error is returned. On GPU, if an out of bound index is found, a 0 is stored in the corresponding output value.

导致的原因是; 因为 ROI层在推断模式下. 生成并且评估的anchor的scores时生成了的比inputs输入的acnhor数量多导致,至于为什么代码会有这个问题,需要追踪scroes和ROI层input的各自的来源,不过就适配cpu模式而言.对model的代码进行一下改动就够了,这些代码是通过tensor debug发现了数值错误所在.并且通过编写新的tensor来处理旧tensor来解决的

  pre_nms_limit = tf.minimum(6000, tf.shape(anchors)[1])
  ix = tf.nn.top_k(scores, pre_nms_limit, sorted=True,
                         name="top_anchors").indices#调试时发现最大值达到了65471

  if self.mode != "training":
       # 增加以下代码,适配cpu模式
       mask = tf.greater(tf.shape(anchors)[1], ix, name="bool_mask")
       ix = tf.stack([tf.boolean_mask(ix, mask)])

关于Mask_RCNN的工程化应用cpu服务器部署日记(二) Tesnor Debugger--tfdgb

猜你喜欢