Neural Magic Eye

Learning to See and Understand the Scene Behind an Autostereogram


Zhengxia Zou (1);  Tianyang Shi (2);  Yi Yuan (2);  Zhenwei Shi (3)

(1) University of Michigan, Ann Arbor;  (2) NetEase Fuxi AI Lab;  (3) Beihang University

  [Preprint]       [Code]       [Colab]



An autostereogram, a.k.a. magic eye image, is a single-image stereogram that can create visual illusions of 3D scenes from 2D textures. This paper studies an interesting question that whether a deep CNN can be trained to recover the depth behind an autostereogram and understand its content. The key to the autostereogram magic lies in the stereopsis - to solve such a problem, a model has to learn to discover and estimate disparity from the quasi-periodic textures. We show that deep CNNs embedded with disparity convolution, a novel convolutional layer proposed in this paper that simulates stereopsis and encodes disparity, can nicely solve such a problem after being sufficiently trained on a large 3D object dataset in a self-supervised fashion. We refer to our method as  "NeuralMagicEye". Experiments show that our method can accurately recover the depth behind autostereograms with rich details and gradient smoothness. Experiments also show the completely different working mechanisms for autostereogram perception between neural networks and human eyes. We hope this research can help people with visual impairments and those who have trouble viewing autostereograms.




The the following, we show some online autostereograms and their decoding results by using our method. The autostereograms are generated by different authors with different graphic engines. For more information about how to correctly view an autostereogram, please check out the instructions on this Wikipedia page.



When a series of autostereograms are shown frame by frame, in the same way moving pictures are shown, human brain will perceive an animated 3D scene behind the autostereogram. Our method can be also apply to decoding animated inputs. The following shows some of our results.

 





One potential application of our method is digital image watermarking. In the following example, we investigate whether our model is capable of recovering depth from a carrier image in which the autostereogram is embedded as hidden watermarks. We first generate a set of autostereograms based on random characters and QR-codes. The pixel values in the character images and QR-codes are recorded as the depth value in the autostereograms. We then train our decoder on a set of superimposed images. These images are generated as a linear combination of background carriers and those autostereograms. Scan the recovered QR-code in the following image with your smartphone and check out what secret is hidden inside.



Our method can be also applied to autostereogram retrieval. To complete such a task, the model has to learn to understand the semantics behind the autostereograms. We, therefore, build an autostereogram recognition network by replacing the upsampling convolution layers in our decoding network with a fully connected layer to predict the class probability of the input. Given a query image, we iterate through all the autostereograms in the database (our testing set) and select the top-k matching results based on the feature distance between the query and the matching images. In the following we show the query image and the top-k retrieval results. We can see although the matching autostereograms have very different texture appearance, they share the same semantics in their depth.



We finally investigated an interesting question that given a depth image, what an "optimal autostereogram" should look like in the eyes of a decoding network. The study of this question may help us understand the working mechanism of neural networks for autostereogram perception. To generate the "optimal autostereogram", we run gradient descent on the input end of our decoding network and minimize the difference between its output and the reference depth image. In the following example, (a) shows the generated "optimal autostereogram" on one of our decoding network with an UNet structure. We named it a "neural autostereogram". In (b)-(c), we show the reference depth and the decoding output.

An interesting thing we observed during this experiment is that, although the decoding output of the network is already very similar to the target depth image, however, human eyes still cannot perceive the depth hidden in this neural autostereogram. Also, there are no clear periodic patterns in this image, which is very different from those autostereograms generated by using graphic engines. More surprisingly, when we feed this neural autostereogram to other decoding networks with very different architectures, we found that these networks can miraculously perceive the depth correctly. To confirm that it is not accidental, we also tried different image initialization and smooth constraints but still have similar observations. In the following image, (d)-(f) show the decoding results on this image by using other different decoding networks. This experiment suggests that neural networks and the human eye may use completely different ways for stereogram perception. The mechanism and properties behind the neural autostereograms are still open questions and need further study.



@misc{zou2020neuralmagiceye,
      title={NeuralMagicEye: Learning to See and Understand the Scene Behind an Autostereogram},
      author={Zhengxia Zou and Tianyang Shi and Yi Yuan and Zhenwei Shi},
      year={2020},
      eprint={2012.15692},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}