Machine vision: how is it different from human vision

Modern business tries to automate everything possible – and for good reason. Machines are better at some tasks than humans. Technologies such as machine vision help with this. Today we explain in simple terms what kind of vision it is and how it works.

A Very Short History of Machine Vision

To begin with, it is worth understanding the terminology. There is computer vision and there is machine vision. Computer vision is both a theory and a set of related technologies. They are about how machines can visually experience objective reality. Simply put, how computers see the world.

For the first time about computer vision, except for science fiction writers, the British scientist Oliver Selfridge began to speak. In 1955, he published the article “The Eyes and Ears of the Computer” in which he predicted the reality in which we already live. One of the main examples is facial recognition systems. Today we post a photo from the party on a social network, and artificial intelligence recognizes a friend in it in a split second and offers to tag him.

Machine vision is a little different. Here we are talking about the scope of knowledge and technology. Machine vision helps to make the production of goods and services more efficient – however, using the same principles as computer vision. The first company producing solutions in this area is considered to be the American Automatix, which in the early 1980s produced several models of machines capable of soldering microcircuits. They were equipped with analog cameras that transmitted the image to the processor for processing. He calculated the image parameters and, based on them, gave commands to the parts of the system directly involved in production.

In a nutshell, machine vision is a technology that helps equipment see how something is being produced, analyze the data, and make an informed decision. And all this in a fraction of a second.

What is better than human vision?

Let’s see how we see the world. Light particles (they are also photons) are constantly reflected from different objects and fall on the retina of the eyes. There are approximately 126 million photon-sensitive cells in each eye that decipher information and send it to the brain. These cells are divided into two types – cones and rods. The former are responsible for color recognition, the latter allow us, in particular, to see at night, working with shades of gray. We have three types of cones – one specializes in blue colors, the second in green, and the third in red. It turns out a complete set of rainbows.

Our visual system, however, is not the most advanced on the planet. Much more complicated are, for example, the eyes of mantis shrimp. They have 16 types of cones at once, and their eyes move independently of each other, and each is divided into three more parts. At the same time, mantis shrimp have a very small and primitive brain compared to ours. It cannot process big data, but it gets a detailed transcript from the eyes already ready. In humans, on the contrary, the eyes are a little simpler, but the brain is the most powerful among all species.

Both approaches are used in machine vision. There are systems with ordinary digital (sometimes even analog) cameras that, reacting to special sensors (they detect if something went wrong), receive a raw image, process it, recognize elements and their patterns, make a decision and give a signal to others. systems. And there is an option with smart cameras. This is just a case of mantis shrimp. Here the cameras already independently carry out part of the analysis and unload the system processors.

And who is more precise – a machine or a person?

Five years ago, machine vision technologies were much less advanced and successfully recognized only 65-70% of objects that fell into their field of view. This is a high figure, but still insufficient for machine vision to be entrusted with responsible tasks. Now machines already recognize up to 98% of objects. Moreover, they really find out – they not only fix the presence, but also determine what exactly they see, and then they can even decide what to do next.

Human reality perception systems still remain more flexible. We, for example, interpret context better. Or rather, even so: we are the only ones who know what it is. Machines diligently study situations that are new to them, but a person can always invent something to confuse the machine. At least for now. Therefore, the proportion of successful cases of recognition is kept at 98% and does not reach 100%.

However, machine vision systems have one indisputable advantage over human vision. Usually we can concentrate on three to seven objects that we see. It depends on the characteristics of a particular person, but rarely much more. Machine vision systems capture absolutely all objects and actions that enter their processors through the image. The attention of the computer cannot be distracted – for it everything that happens is of equal importance.

Here are some problems that can be solved with machine vision

Imagine you have a tray with 50 nuts on it. Of these, 48 are normal, high-quality nuts, one has a scratch on the side, and another has a swelling on one of the faces. In addition, for some reason, there is a bolt among the nuts. Probably, in a couple of seconds you will find unnecessary and defective parts. However, a second tray with nuts immediately appears in front of you. And then another. And so for eight hours.

This is a typical production operator change. It is likely that in a couple of hours such an employee (regardless of professionalism) will lose concentration – for a second he will think about lunch or the end of yesterday’s series. Maybe he will be distracted by a remark from a colleague. In any case, most likely, sooner or later he will miss a couple of defective parts. This is normal: the factor of oversight is probably already built into the production figures. However, if a system with machine vision controls production instead of a person, then it will work equally reliably for at least a whole year without interruption. It happens like this: the sensors scan all the details and send a signal – if something is wrong. Cameras paired with LEDs carefully study the picture and transmit the images to the computer.

This solution saves money. And the production controller can always be retrained as an operator of such a system – his experience in setting up the machine will obviously come in handy. Today they are quite simple and work intuitively. One example is PowerAI Vision from IBM. To transfer your knowledge to the system and generally show it the features of your work, you do not need to be a deep learning specialist at all.

Another popular use case for technology is security. Working in the same way as with the nuts, the machine vision system will instantly analyze the shop and find the worker who forgot to wear a safety helmet. And then just block his machine or give him a warning over the speakerphone.

The third area for machine vision is the Internet of things. So called a set of technologies that allow different devices to interact with each other. For example, refrigerators already exist that use machine vision to detect spoiled food.

Such solutions can be implemented not only in factory and factory floors, but also in warehouses, retail, banks, logistics systems and transport services, agriculture and animal husbandry, and so on. In the US market, machine vision systems have been used earlier and more actively (due to the greater number of solutions offered), and now they are used in a variety of industries – from the automotive industry to pharmaceuticals.