Comparison of human and artificial image recognition: some considerations – Nonteek – News Block

When it comes to artificial intelligence, the debate can heat up rather quickly, usually between a faction crouching in a position of self-defense, arguing that machines will not reach human capabilities any time soon, and a faction that argues that the age of Instead, AI is almost here, if it hasn’t arrived yet.

This post is not meant to be an introduction to the above arguments (I could write a more detailed post later), but to set out some considerations on how misleading a crude comparison between the results of the two can be if the full context is not taken into account.

Speaking of Deep Neural Networks (DNN), today they are considered state-of-the-art in many areas of artificial intelligence, especially computer vision, so we might as well consider them an important benchmark for this debate. So how do they relate to human vision? Are they on a par with our own capabilities? It turns out that the answer is not exactly simple.

An interesting article by Christian Szegedy & coll.(1) showed that DNNs have counterintuitive properties, that is, they seem to be very good at generalization, even better than humans, but can be easily fooled with adversarial negative examples. The authors hypothesized that one possible explanation was the extremely low probability that such adversarial sets would be observed in a test set, but (like rational numbers) dense enough to be found in virtually all test cases. .

Adversary examples of MNIST digit images. The odd columns are the original images, while the even columns are slightly distorted images with a suitable function. Distorted images are very easy for humans to recognize but are never recognized by the Neural Network(0% accuracy).
In this image, the even columns are the images processed with random distortion. Interestingly, the recognition accuracy here was around 51%, while unrecognizable by humans.

Many years have passed since the first pioneering works on adversarial classification(23), and today many adversarial examples are generated with Evolutionary Algorithms (EAs) that evolve as a population of images. With these types of algorithms, it is interesting to note that it is possible to trick state-of-the-art neural networks to “recognize” with almost 100% certainty that images evolved to be totally unrecognizable to humans as natural objects.(4).

Using evolutionary algorithms to produce images that match DNN classes can produce a wide variety of different images, and looking at these, the authors interestingly note that:

“For many of the images produced, one can begin to identify why the DNN believes the image is of that class once the class tag is assigned to it. This is because evolution only needs to produce features that are unique or discriminatory for a class, rather than producing an image that contains all the typical features of a class.”

These examples demonstrate how AI recognition can be intentionally tricked into not recognizing some images that are obvious to us (false negatives), and also making it recognize with high confidence something that is obviously not there to us. There is a lot of literature on this topic.​(5–7)​which can be quite important also from a cybersecurity perspective(8).

However, we must stress that human recognition has its own shortcomings as well: there are plenty of optical illusions to prove it, including the famous white and gold versus blue and black dress, which generated much debate.

The famous black and blue dress: some people see it as blue and black, while others see it as white and gold. The lack of context coupled with poor image quality forces us to make guesses, and what we “see” depends on our own interpretation of the ambient luminosity.
A visual explanation of how context can trick us into seeing what’s not there: the two images above are the same shot, where the one on the right had the model and background slightly obscured, leaving the dress untouched.

There are cases where artificial recognition can consistently outperform humans.(9,10)as fine-grained intraclass recognition (for example, dog breeds, snakes, etc.). It also seems that humans may be even more susceptible than AI when there is insufficient training data, that is, the human himself did not have enough exposure to that kind of class.

Human perception is a tricky beast, it seems extremely good to us, because it can be quite robust and adaptive, but as we have just seen, it depends a lot on prior knowledge since we also need training (sometimes lifelong training) to be able to perform. with some degree of success. Sure enough, we also have some innate categories that we are very adept at recognizing from birth (for example, human faces of our own race), but guess what? We’re also susceptible to being fooled there too, if we just change the lighting.(11,12).

Even human faces can be hard for us to recognize, with just a change in lighting.

Furthermore, we depend on aspects of reality that are not objective at all, such as colors. Everyone knows that colors depend on the wavelengths of light reflected from objects, but we often forget that what really makes colors what they are to us is our brain interpretation. In short, colors do not exist in nature, they are just a small portion of light that our brain encodes into specific sensations. We don’t see infrared, ultraviolet, or gamma rays as color, which are definitely there, and we also see colors that don’t really “exist” in the spectrum, like brown.

Our perception is strongly linked not only to our neurophysiology but also to our cultural context. There is a now famous Namibian tribe, called the Himba, who have dozens of terms to define green, while no words for blue, and apparently their members don’t seem to be able to tell blue from green at all, while they are still much better than we are at detecting very slight differences in greens(13,14). Furthermore, very recent studies have shown that humans may be just as prone to being fooled by some sort of conflicting imagery as machines.(9,15,16).

The variations in deficiencies between human and artificial image recognition suggest that the process is very different. Human reconnaissance is no better or worse than machine reconnaissance, or at least it’s a very poorly posed problem, as we constantly neglect to take into account the knowledge and training we need to perform any reconnaissance.


  1. (1)

    C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, IJ Goodfellow, R. Fergus, Intriguing Properties of Neural Networks, in: 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014.

  2. (2)

    N. Dalvi, P. Domingos, Mausam, S. Sanghai, D. Verma, Adversarial Classification, in: Proceedings of the 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining – KDD ’04, ACM Press, 2004. doi : 10.1145 /1014052.1014066.

  3. (3)

    D. Lowd, C. Meek, Adversarial learning, in: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining – KDD ’05, ACM Press, 2005. doi:10.1145/1081870.1081950.

  4. (4)

    A. Nguyen, J. Yosinski, J. Clune, Deep Neural Networks Are Easily Fooled: High Confidence Predictions for Unrecognizable Images, ArXiv E-Prints. (2014) arXiv:1412.1897.

  5. (5)

    B. Biggio, F. Roli, Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning, (2017).

  6. (7)

    A. Krizhevsky, I. Sutskever, GE Hinton, ImageNet Classification with Deep Convolutional Neural Networks, Commun. MCA. (2017) 84–90. doi:10.1145/3065386.

  7. (eleven)

    C. Hong Liu, CA Collin, AM Burton, A. Chaudhuri, Direction of lighting affects recognition of non-textured faces in photographic positives and negatives, Vision Research. (1999) 4003–4009. doi:10.1016/s0042-6989(99)00109-1.

  8. (12)

    A. Missinato, Face recognition with photographic negatives: the role of spatial frequencies and facial specificity, University of Aberdeen, 1999.

  9. (fifteen)

    Gamaeldin F. Elsayed, Shreya Shankar, Brian Cheung, Nicolas Papernot, Alex Kurakin, Ian Goofellow, Jascha Sohl-Dickstein, Adversarial Examples Fooling Both Computer Vision and Humans with Limited Time, (2018).

  10. (sixteen)

    E. Watanabe, A. Kitaoka, K. Sakamoto, M. Yasugi, K. Tanaka, Illusory motion reproduced by prediction-trained deep neural networks, front. psychol. (2018). doi:10.3389/fpsyg.2018.00345.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top