Depth estimation on OASIS images. We trained the method of MiDaS DPT-Hyprid (Ranftl et al. 2021), but only on a starter datset of data output from our pipeline (Omni).
Surface normals extracted from depth predictions. The high-resolution meshes in the starter dataset also seem to produce networks that make more precies shape predictions, as shown by the surface normal vectors extracted from the predictions in the bottom row.
Surface normal prediction on OASIS images. Neither model saw OASIS images during training.
Surface normal prediction on OASIS images. The Omnidata-trained model outperformed the baseline model trained on OASIS data itself.
Dataset field-of-view influences image content.
e.g. FoV is correlated with object-level focus.
Cross-task comparisons are complicated by confounding factors. Comparing two pretrained models utility for transfer learning is difficult when the two models were trained on disjoint datasets with different parameters: domains, numbers of images, sensor types, resolutions, etc.
13 of 21 mid-level cues from the Annotator. Each label/cue is produced for each RGB view/point combination, and there are guaranteed to be 'k' views of each point.
Annotator: inputs and outputs.
The annotator generates images and videos of aligned mid-level cues, given an untextured mesh, a texture/aligned RGB images, and an optional pre-generated camera pose file. A 3D pointcloud can be used as well: simply mesh the pointcloud using a standard mesher like COLMAP (result shown above).
Static views of camera/point combinations. Multi-view constraints guarantee at least k views of each point.
Videos of interplated trajectories. The annotator can also generate videos by interpolating between cameras.
All mid-level cues are available for each frame. The following figure shows a few of these cues on a building from the Replica dataset.
5 of 21 outputs (video sampling).
How does the pipeline do this? It creates cameras, points, and views in 4 stages (below). For more information, check out the paper or annotator repo.
The annotator github contains examples, documentation, a Dockerized runnable container, and the raw code.
The omnitools
CLI contains parallelized scripts to download and manipulate some or all of the 14 million image starter data. These scripts can also be reused to manipulate data generated from the annotator.
The tooling repo contains many of the tools that we found useful during the project: PyTorch dataloaders for annotator-produced data, data transformations, training pipelines, and our reimplementation of MiDaS.