To demonstrate the capabilities of the technique, NVIDIA’s research and creative teams began by collecting around 100 images each of five musical instruments from different angles. Using that information, 3D MoMa reconstructed the 2D images into 3D representations of each instrument, represented as triangle meshes, the form typically used by game engines, 3D modelers, and film renderers.
According to David Luebke, vice president of graphics research at NVIDIA, inverse rendering, a technique to reconstruct a series of still photos into a 3D model of an object or scene, has long been a holy grail unifying computer vision and computer graphics.
The generated 3D objects would traditionally be created through complex photogrammetry techniques that require significant time and manual effort, NVIDIA said. Recent work in neural radiance fields can rapidly generate a 3D representation of an object or scene, though not in a triangle mesh format.
The objects generated through 3D MoMa are directly compatible with the 3D graphics engines and modeling tools already used by creators, meaning they can be placed directly into animated scenes and manipulated to change their texture, lighting, and scale.
A paper describing the research was presented at the Conference on Computer Vision and Pattern Recognition (https://nvlabs.github.io/nvdiffrec/assets/paper.pdf).