代码解读：基于深度学习的单目深度估计(2)

68 阅读 0 评论 45 点赞

我是靠谱客的博主开朗睫毛，这篇文章主要介绍代码解读：基于深度学习的单目深度估计(2)，现在分享给大家，希望可以做个参考。

那就接着分析depth.py呗，

先来分析_image_montage()函数，

复制代码

1
2
3
4
5
def _image_montage(imgs, min, max):
    imgs = imgutil.bxyc_from_bcxy(imgs)
    return imgutil.montage(
        imgutil.scale_values(imgs, min=min, max=max),
        border=1)

不难看出这是对rgb图像的量化处理

再来分析_depth_motage()函数，

复制代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
def _depth_montage(depths):
    if depths.ndim == 4:
        assert depths.shape[1] == 1
        depths = depths[:,0,:,:]
    #depths = imgutil.scale_values(depths, min=-2.5, max=2.5)
    #depths = map(imgutil.scale_values, depths)
    masks = []
    for i in xrange(len(depths)):
        x = depths[i]
        mask = x != x.min() 
        masks.append(mask)
        x = x[mask]
        if len(x) == 0:
            d = np.zeros_like(depths[i])
        else:
            d = imgutil.scale_values(depths[i], min=x.min(), max=x.max())
        depths[i] = d
    depths = plt.cm.jet(depths)[...,:3]
    for i in xrange(len(depths)):
        for c in xrange(3):
            depths[i, :, :, c][masks[i] == 0] = 0.2
    return imgutil.montage(depths, border=1)

这段代码可以了解到：

1，虽然读不太懂，可以推知第一个for语句是对深度图像做量化操作

2，第二个for语句意思不太明白

3，注意depths的维数可能是4维

再来看看简单的_zero_pad_bach()函数，

复制代码

1
2
3
4
5
6
7
8
def _zero_pad_batch(batch, bsize):
    assert len(batch) <= bsize
    if len(batch) == bsize:
        return batch
    n = batch.shape[0]
    shp = batch.shape[1:]
    return np.concatenate((batch, np.zeros((bsize - n,) + shp,
                                           dtype=batch.dtype)))

不难理解，是补零的操作

经过上述代码的热身之后，开始分析一些比较重要的代码，分析class machine的类函数，

复制代码

1
2
3
class machine(Machine):
    def __init__(self, conf):
        Machine.__init__(self, conf)

来看class类的第一个函数infer_depth()，

复制代码

def infer_depth(self, images):
        '''
        Infers depth maps for a list of 320x240 images.
        images is a nimgs x 240 x 320 x 3 numpy uint8 array.
        returns depths (nimgs x 55 x 74) corresponding to the center box
        in the original rgb image.
        '''
        images = images.transpose((0,3,1,2))
        (nimgs, nc, nh, nw) = images.shape
        assert (nc, nh, nw) == (3, 240, 320)#网络的输出图片数据为(1,3, 240, 320)

(input_h, input_w) = self.input_size#网络输入feature map 图片的大小
        (output_h, output_w) = self.output_size#网络输出feature map大小

bsize = self.bsize
        b = 0

# pred_depth为输出，Tensor 类型变量，
        v = self.vars
        pred_depth = self.inverse_depth_transform(self.fine.pred_mean)
        infer_f = theano.function([v.images], pred_depth)

depths = np.zeros((nimgs, output_h, output_w), dtype=np.float32)

# 一张图片的中心 bbox ，(i0, i1)为矩形的左上角、(j0, j1)为矩形的右下角
        dh = nh - input_h
        dw = nw - input_w
        (i0, i1) = (dh/2, nh - dh/2)
        (j0, j1) = (dw/2, nw - dw/2)

# infer depth for images in batches
        b = 0
        while b < nimgs:
            batch = images[b:b+bsize]
            n = len(batch)
            if n < bsize:
                batch = _zero_pad_batch(batch, bsize)

# crop to network input size
            batch = batch[:, :, i0:i1, j0:j1]

# infer depth with nnet
            depths[b:b+n] = infer_f(batch)[:n]
            
            b += n

return depths

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
    def infer_depth(self, images):
        '''
        Infers depth maps for a list of 320x240 images.
        images is a nimgs x 240 x 320 x 3 numpy uint8 array.
        returns depths (nimgs x 55 x 74) corresponding to the center box
        in the original rgb image.
        '''
        images = images.transpose((0,3,1,2))
        (nimgs, nc, nh, nw) = images.shape
        assert (nc, nh, nw) == (3, 240, 320)#网络的输出图片数据为(1,3, 240, 320)

        (input_h, input_w) = self.input_size#网络输入feature map 图片的大小
        (output_h, output_w) = self.output_size#网络输出feature map大小

        bsize = self.bsize
        b = 0

        # pred_depth为输出，Tensor 类型变量，
        v = self.vars
        pred_depth = self.inverse_depth_transform(self.fine.pred_mean)
        infer_f = theano.function([v.images], pred_depth)

        depths = np.zeros((nimgs, output_h, output_w), dtype=np.float32)

        # 一张图片的中心 bbox ，(i0, i1)为矩形的左上角、(j0, j1)为矩形的右下角
        dh = nh - input_h
        dw = nw - input_w
        (i0, i1) = (dh/2, nh - dh/2)
        (j0, j1) = (dw/2, nw - dw/2)

        # infer depth for images in batches
        b = 0
        while b < nimgs:
            batch = images[b:b+bsize]
            n = len(batch)
            if n < bsize:
                batch = _zero_pad_batch(batch, bsize)

            # crop to network input size
            batch = batch[:, :, i0:i1, j0:j1]

            # infer depth with nnet
            depths[b:b+n] = infer_f(batch)[:n]
            
            b += n

        return depths

从这段代码可以了解：

1，函数infer_depth()的目的推测图像中的深度信息

2，定义深度网络输入输出的图片大小

3，对于while的循环语句，容易理解，对每一批图像进行处理，通过infer_f()估计图像深度信息。至于infer_f()内部

结构，在while语句前是这样定义的：

v = self.vars

pred_depth = self.inverse_depth_transform(self.fine.pred_mean)

infer_f = theano.function([v.images], pred_depth)

我猜测infer_f是一个句柄

4，对于函数inverse_depth_transform()，下面做分析

OK，来分析inverse_depth_transform(),

复制代码

1
2
3
4
    def inverse_depth_transform(self, logdepths):
        # map network output log depths back to depth
        # output bias is init'd with the mean, and output is logdepth / stdev
        return T.exp(logdepths * self.meta.logdepths_std)

从这段代码可以推测，

1，深度网络输出的log深度信息

2，通过做指数运算，可以把对数消去

下一次再分析后面的函数！