 
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
        We introduce a learning-based depth map fusion framework that generates an improved set of depth and confidence maps from the output of Multi-view Stereo (MVS) networks. This is accomplished by integrating volumetric visibility constraints that encode long-range surface relationships across different views into an end-to-end trainable architecture. We also introduce a depth search window estimation sub-network trained jointly with the larger fusion sub-network to reduce the depth hypothesis search space along each ray. Our method learns to model depth consensus and violations of visibility constraints directly from the data; effectively removing the necessity of fine-tuning fusion parameters. Extensive experiments on MVS datasets show substantial improvements in the accuracy of the output fused depth and confidence maps.
 
        
@inproceedings{burgdorfer_2023_vfuse,
  title         = {{V-FUSE: Volumetric Depth Map Fusion with Long-Range Constraints}},
  author        = {Burgdorfer, Nathaniel and Mordohai, Philippos},
  booktitle     = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year          = {2023},
}