论文链接

作者:Iro Laina, Christian Rupprecht, Vasileios Belagiannis, Federico Tombari, Nassir Navab

一、摘要

This paper addresses the problem of estimating the depth map of a scene given a single RGB image. We propose a fully convolutional architecture, encompassing residual learning, to model the ambiguous mapping between monocular images and depth maps. In order to improve the output resolution, we present a novel way to efficiently learn feature map up-sampling within the network. For optimization, **we introduce the reverse Huber loss that is particularly suited for the task at hand and driven by the value distributions commonly present in depth maps. ** Our model is composed of a single architecture that is trained end-to-end and does not rely on post-processing techniques, such as CRFs or other additional refinement steps. As a result, it runs in real-time on images or videos. In the evaluation, we show that the proposed model contains fewer parameters and requires fewer training data than the current state of the art, while outperforming all approaches on depth estimation. Code and models are publicly available.

二、介绍

  1. 深度信息用途:

    Moreover, the availability of reasonably accurate depth information is well-known to improve many computer vision tasks with respect to the RGB-only counterpart, for example in reconstruction, recognition, sematic segmentation or human pose estimation

三、相关工作

  1. Liu等人将语义分割任务与深度估计任务相结合,其中使用预测标签作为附加约束,以促进优化任务

  2. Konrad等人在检索到的深度图上计算一个中值,然后通过交叉双边滤波进行平滑

  3. Liu等人将优化问题表述为具有连续和离散变量势的条件随空场(CRF)

  4. Eigen等人首次使用CNN以双尺度架构来对单一图像进行回归稠密深度图

  5. Roy和Todorovic等人提出将CNN与回归森林结合,在每个树节点使用非常浅的架构,从而限制了对大数据的需求

  6. Liu等人提出在CNN训练中以CRF损失的形式学习一元和成对的潜在特征

  7. Li等人和Wang等人使用分级CRF从超像素到像素级来细化他们的CNN预测

四、方法

CNN结构

  1. AlexNet的接受域是151×151像素,VGG-16的接受域是276×276像素,而ResNet-50的接受域达到483×483
  2. Resnet由于其残差结构可以避免网络退化和梯度消失问题
  3. 移去最后的池化层之后,若继续使用全连接层会产生过多的参数,于是作者提出了一种更少参数的上采样块
  4. 模块首先是一个unpooling层,用于提高特征图空间分辨率,之后接一个5×5的卷积层,通过ReLU激活。不过由于unpooling其后的卷积计算内容会包含很多0,作者通过将5×5卷积核分成4个小卷积核分别计算之后进行聚合,提高效率:image-20210927103513425
  5. 模块结构如图:image-20210927103318008

损失函数

回归问题的一个标准损失函数是L2L_2损失函数,而作者使用BerHu损失函数,它在x[c,c]x∈[−c,c]时等于L1L_1损失,在这个范围之外等于L2L_2损失。每次梯度下降时都设置c=15maxic=\frac{1}{5}max_i(yi~yi\lvert\tilde{y_i}-y_i\rvert)

五、实验结果

NYU Depth Dataset(室内场景)

  1. 对比不同CNN变体与作者提出的结构image-20210927110803970
  2. 与先进技术对比image-20210927110921273
  3. 本文方法与其他方法预测结果:image-20210927111039651

Make3D Dataset(室外场景)

  1. 与先前其他工作对比:
    image-20210927131015806

六、总结

  1. 这篇论文使用基于残差学习的单尺度CNN结构,去除了全连接层减少了参数量,
  2. 提出了更有效率的上采样卷积层,减少经过unpooling后的多数无效卷积操作
  3. 使用BerHu损失函数,动态调整损失函数