[Codebase] [Presentation Slides]

Screenshot of our overall method. The robot uses markers to predict absolute depth for up to 3 cubes. We then use these absolute depth predictions with a relative monocular depth prediction from MiDaS to predict absolute depth at every pixel

My senior year I took Cognitive Robotics, a course in which you program Cozmo, a robot with a camera sensor. Our goal was to estimate absolute depth of every pixel seen by the camera sensor

Implementation

Thanks to CMU, cozmo also had access to a ~8 GB GPU. For our final project, my partner Akshath and I decided to use MiDaS, a monocular depth model, to predict depth at every frame that Cozmo sees.

Since MiDaS only gives relative depth, this depth map is not grounded with real world depth values. However, when Cozmo sees a light cube, a special object with an aruco marker, he knows how far away this light cube is. Using light cubes as a sparse depth signal, we calculate an optimal scaling factor to multiply to the relative MiDaS depth map to give an accurate depth map of the image, which can then be queried at any pixel. Feel free to look at the slides linked above for a full explanation and proof of optimality for our scaling factor!

Demo