Most existing automatic 2D-to-3D conversion pipelines can be roughly divided into two stages. Our quantitative and qualitative analyses demonstrate the benefits of our solution. We compare our method with the ground-truth and baselines that use state-of-the-art single view depth estimation techniques. We also conduct human subject experiments to show the effectiveness of our solution. For quantitative evaluations, we use a dataset of 3D movies and report pixel-wise metrics comparing the reconstructed right view and the ground-truth right view. Our model also performs in-painting implicitly without the need for post-processing.Įvaluating the quality of the 3D scene generated from the left view is non-trivial. We show that this approach is easier to train for than the alternative of using a stereo algorithm to derive a disparity map, training the model to predict disparity explicitly, and then using the predicted disparity to render the new image. The internal disparity-like map produced by the network is computed only in service of creating a good right eye view and is not intended to be an accurate map of depth or disparity. We train our model end-to-end on ground-truth stereo-frame pairs with the objective of directly predicting one view from the other. To that end, we design a deep neural network that takes as input the left eye’s view, internally estimates a soft (probabilistic) disparity map, and then renders a novel image for the right eye. In spite of these difficulties, our intuition is that given the vast number of stereo-frame pairs that exist in already-produced 3D movies it should be possible to train a machine learning model to predict the novel view from the given view. In addition to depth ambiguities, some pixels in the novel view correspond to geometry that’s not visible in the available view, which causes missing data that must be hallucinated with an in-painting algorithm. Inferring depth (or disparity) from a single image, however, is a highly under-constrained problem. Solving this problem entails reasoning about depth from a single image and synthesizing a novel view for the other eye. In this paper, we propose a fully automated, data-driven approach to the problem of 2D-to-3D video conversion. Automated 2D-to-3D conversion would eliminate this obstacle.
#BEST 2D TO 3D VIDEO CONVERTER PC MOVIE#
High production cost is the main hurdle in the way of scaling up the 3D movie industry. However, this process is still expensive as it requires intensive human effort.Įach year about 20 new 3D movies are produced. Standard Depth Image-Based Rendering (DIBR) algorithms can then be used to combine the original frame with the depth map in order to arrive at a stereo image pair. Professional conversion processes typically rely on “depth artists” who manually create a depth map for each frame. 2D-to-3D conversion offers an alternative to filming in 3D. For example, some inexpensive optical special effects, such as forced perspective Footnote 1, are not compatible with multi-view capturing devices. Aside from equipment costs, there are cinemagraphic issues that may preclude the use of stereo camera rigs. Shooting in 3D requires costly special-purpose stereo camera rigs. There are two approaches to making 3D movies: shooting natively in 3D or converting to 3D after shooting in 2D. For each frame, the format includes two projections of the same scene, one of which is exposed to the viewer’s left eye and the other to the viewer’s right eye, thus giving the viewer the experience of seeing the scene in three dimensions. 3D videos and images are usually stored in stereoscopic format.